Overview

Brought to you by YData

Dataset statistics

Number of variables70
Number of observations724508
Missing cells30334160
Missing cells (%)59.8%
Total size in memory386.9 MiB
Average record size in memory560.0 B

Variable types

Text70

Dataset

DescriptionNMNH Paleobiology Specimen Records (USNM) 0049391-241126133413365
URLhttps://doi.org/10.15468/dl.ws2uf3

Alerts

institutionID has constant value "http://biocol.org/urn:lsid:biocol.org:col:34871" Constant
collectionID has constant value "urn:uuid:ce595e88-ceba-42c0-a3ff-cd55b694fac" Constant
institutionCode has constant value "USNM" Constant
collectionCode has constant value "PAL" Constant
datasetName has constant value "NMNH Paleobiology (USNM)" Constant
basisOfRecord has constant value "FossilSpecimen" Constant
verbatimCoordinateSystem has constant value "Degrees Minutes Seconds" Constant
catalogNumber has 50535 (7.0%) missing values Missing
recordNumber has 675939 (93.3%) missing values Missing
recordedBy has 563497 (77.8%) missing values Missing
preparations has 591600 (81.7%) missing values Missing
associatedMedia has 637195 (87.9%) missing values Missing
occurrenceRemarks has 638259 (88.1%) missing values Missing
fieldNumber has 720044 (99.4%) missing values Missing
eventDate has 453741 (62.6%) missing values Missing
startDayOfYear has 571939 (78.9%) missing values Missing
endDayOfYear has 571953 (78.9%) missing values Missing
year has 453741 (62.6%) missing values Missing
month has 571556 (78.9%) missing values Missing
day has 593848 (82.0%) missing values Missing
verbatimEventDate has 445814 (61.5%) missing values Missing
locationID has 335037 (46.2%) missing values Missing
higherGeography has 148417 (20.5%) missing values Missing
continent has 210428 (29.0%) missing values Missing
waterBody has 696851 (96.2%) missing values Missing
islandGroup has 723710 (99.9%) missing values Missing
island has 714401 (98.6%) missing values Missing
country has 173269 (23.9%) missing values Missing
stateProvince has 226462 (31.3%) missing values Missing
county has 454433 (62.7%) missing values Missing
locality has 560871 (77.4%) missing values Missing
verbatimElevation has 724311 (> 99.9%) missing values Missing
verbatimDepth has 724424 (> 99.9%) missing values Missing
decimalLatitude has 620569 (85.7%) missing values Missing
decimalLongitude has 620569 (85.7%) missing values Missing
geodeticDatum has 698201 (96.4%) missing values Missing
verbatimLatitude has 724503 (> 99.9%) missing values Missing
verbatimLongitude has 724503 (> 99.9%) missing values Missing
verbatimCoordinateSystem has 654265 (90.3%) missing values Missing
georeferenceProtocol has 695012 (95.9%) missing values Missing
georeferenceRemarks has 724503 (> 99.9%) missing values Missing
earliestEraOrLowestErathem has 220036 (30.4%) missing values Missing
latestEraOrHighestErathem has 718163 (99.1%) missing values Missing
earliestPeriodOrLowestSystem has 245750 (33.9%) missing values Missing
latestPeriodOrHighestSystem has 718167 (99.1%) missing values Missing
earliestEpochOrLowestSeries has 376914 (52.0%) missing values Missing
latestEpochOrHighestSeries has 718290 (99.1%) missing values Missing
earliestAgeOrLowestStage has 562472 (77.6%) missing values Missing
latestAgeOrHighestStage has 722133 (99.7%) missing values Missing
group has 633218 (87.4%) missing values Missing
formation has 365706 (50.5%) missing values Missing
member has 643191 (88.8%) missing values Missing
typeStatus has 581882 (80.3%) missing values Missing
identifiedBy has 521981 (72.0%) missing values Missing
scientificName has 171332 (23.6%) missing values Missing
higherClassification has 172643 (23.8%) missing values Missing
kingdom has 172847 (23.9%) missing values Missing
phylum has 211856 (29.2%) missing values Missing
class has 235611 (32.5%) missing values Missing
order has 400004 (55.2%) missing values Missing
family has 409455 (56.5%) missing values Missing
genus has 197061 (27.2%) missing values Missing
subgenus has 702202 (96.9%) missing values Missing
specificEpithet has 197674 (27.3%) missing values Missing
infraspecificEpithet has 708037 (97.7%) missing values Missing
taxonRank has 707802 (97.7%) missing values Missing
scientificNameAuthorship has 325030 (44.9%) missing values Missing
gbifID has unique values Unique
occurrenceID has unique values Unique

Reproduction

Analysis started2025-01-14 16:33:33.665307
Analysis finished2025-01-14 16:33:50.709019
Duration17.04 seconds
Software versionydata-profiling vv4.12.1
Download configurationconfig.json

Variables

gbifID
Text

Unique 

Distinct724508
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-14T11:33:51.139117image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters7245080
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique724508 ?
Unique (%)100.0%

Sample

1st row1316557253
2nd row2235727162
3rd row1316557263
4th row1316557258
5th row1316557269
ValueCountFrequency (%)
1316557253 1
 
< 0.1%
1316557860 1
 
< 0.1%
1316557419 1
 
< 0.1%
1316557667 1
 
< 0.1%
1316557340 1
 
< 0.1%
1316557263 1
 
< 0.1%
1316557258 1
 
< 0.1%
1316557269 1
 
< 0.1%
1316557294 1
 
< 0.1%
3311036301 1
 
< 0.1%
Other values (724498) 724498
> 99.9%
2025-01-14T11:33:51.688166image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 1858630
25.7%
3 1114337
15.4%
6 924334
12.8%
7 682226
 
9.4%
0 507951
 
7.0%
8 482636
 
6.7%
9 467327
 
6.5%
5 426943
 
5.9%
2 401616
 
5.5%
4 379080
 
5.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 7245080
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 1858630
25.7%
3 1114337
15.4%
6 924334
12.8%
7 682226
 
9.4%
0 507951
 
7.0%
8 482636
 
6.7%
9 467327
 
6.5%
5 426943
 
5.9%
2 401616
 
5.5%
4 379080
 
5.2%

Most occurring scripts

ValueCountFrequency (%)
Common 7245080
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 1858630
25.7%
3 1114337
15.4%
6 924334
12.8%
7 682226
 
9.4%
0 507951
 
7.0%
8 482636
 
6.7%
9 467327
 
6.5%
5 426943
 
5.9%
2 401616
 
5.5%
4 379080
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7245080
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 1858630
25.7%
3 1114337
15.4%
6 924334
12.8%
7 682226
 
9.4%
0 507951
 
7.0%
8 482636
 
6.7%
9 467327
 
6.5%
5 426943
 
5.9%
2 401616
 
5.5%
4 379080
 
5.2%
Distinct6008
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-14T11:33:51.894252image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters13765652
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1783 ?
Unique (%)0.2%

Sample

1st row2014-11-25 18:32:00
2nd row2024-10-17 09:58:00
3rd row2024-10-17 10:44:00
4th row2024-08-03 21:41:00
5th row2024-10-17 10:17:00
ValueCountFrequency (%)
2024-10-17 379839
26.2%
2024-08-03 110663
 
7.6%
2014-12-01 62342
 
4.3%
2014-11-25 62169
 
4.3%
2024-11-18 18663
 
1.3%
2014-11-26 16425
 
1.1%
2022-07-29 12130
 
0.8%
22:06:00 11127
 
0.8%
11:08:00 10895
 
0.8%
22:09:00 9244
 
0.6%
Other values (1703) 755519
52.1%
2025-01-14T11:33:52.158751image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 3567224
25.9%
1 2229486
16.2%
2 1840704
13.4%
- 1449016
10.5%
: 1449016
10.5%
4 856419
 
6.2%
724508
 
5.3%
7 523431
 
3.8%
3 323301
 
2.3%
8 267407
 
1.9%
Other values (3) 535140
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 10143112
73.7%
Dash Punctuation 1449016
 
10.5%
Other Punctuation 1449016
 
10.5%
Space Separator 724508
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 3567224
35.2%
1 2229486
22.0%
2 1840704
18.1%
4 856419
 
8.4%
7 523431
 
5.2%
3 323301
 
3.2%
8 267407
 
2.6%
5 251997
 
2.5%
9 156334
 
1.5%
6 126809
 
1.3%
Dash Punctuation
ValueCountFrequency (%)
- 1449016
100.0%
Other Punctuation
ValueCountFrequency (%)
: 1449016
100.0%
Space Separator
ValueCountFrequency (%)
724508
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 13765652
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 3567224
25.9%
1 2229486
16.2%
2 1840704
13.4%
- 1449016
10.5%
: 1449016
10.5%
4 856419
 
6.2%
724508
 
5.3%
7 523431
 
3.8%
3 323301
 
2.3%
8 267407
 
1.9%
Other values (3) 535140
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13765652
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 3567224
25.9%
1 2229486
16.2%
2 1840704
13.4%
- 1449016
10.5%
: 1449016
10.5%
4 856419
 
6.2%
724508
 
5.3%
7 523431
 
3.8%
3 323301
 
2.3%
8 267407
 
1.9%
Other values (3) 535140
 
3.9%

institutionID
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-14T11:33:52.231436image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length47
Median length47
Mean length47
Min length47

Characters and Unicode

Total characters34051876
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowhttp://biocol.org/urn:lsid:biocol.org:col:34871
2nd rowhttp://biocol.org/urn:lsid:biocol.org:col:34871
3rd rowhttp://biocol.org/urn:lsid:biocol.org:col:34871
4th rowhttp://biocol.org/urn:lsid:biocol.org:col:34871
5th rowhttp://biocol.org/urn:lsid:biocol.org:col:34871
ValueCountFrequency (%)
http://biocol.org/urn:lsid:biocol.org:col:34871 724508
100.0%
2025-01-14T11:33:52.344280image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 5071556
14.9%
: 3622540
 
10.6%
l 2898032
 
8.5%
r 2173524
 
6.4%
/ 2173524
 
6.4%
i 2173524
 
6.4%
c 2173524
 
6.4%
b 1449016
 
4.3%
. 1449016
 
4.3%
t 1449016
 
4.3%
Other values (12) 9418604
27.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 23184256
68.1%
Other Punctuation 7245080
 
21.3%
Decimal Number 3622540
 
10.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 5071556
21.9%
l 2898032
12.5%
r 2173524
9.4%
i 2173524
9.4%
c 2173524
9.4%
b 1449016
 
6.2%
t 1449016
 
6.2%
g 1449016
 
6.2%
d 724508
 
3.1%
h 724508
 
3.1%
Other values (4) 2898032
12.5%
Decimal Number
ValueCountFrequency (%)
7 724508
20.0%
8 724508
20.0%
4 724508
20.0%
3 724508
20.0%
1 724508
20.0%
Other Punctuation
ValueCountFrequency (%)
: 3622540
50.0%
/ 2173524
30.0%
. 1449016
 
20.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 23184256
68.1%
Common 10867620
31.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 5071556
21.9%
l 2898032
12.5%
r 2173524
9.4%
i 2173524
9.4%
c 2173524
9.4%
b 1449016
 
6.2%
t 1449016
 
6.2%
g 1449016
 
6.2%
d 724508
 
3.1%
h 724508
 
3.1%
Other values (4) 2898032
12.5%
Common
ValueCountFrequency (%)
: 3622540
33.3%
/ 2173524
20.0%
. 1449016
 
13.3%
7 724508
 
6.7%
8 724508
 
6.7%
4 724508
 
6.7%
3 724508
 
6.7%
1 724508
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 34051876
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 5071556
14.9%
: 3622540
 
10.6%
l 2898032
 
8.5%
r 2173524
 
6.4%
/ 2173524
 
6.4%
i 2173524
 
6.4%
c 2173524
 
6.4%
b 1449016
 
4.3%
. 1449016
 
4.3%
t 1449016
 
4.3%
Other values (12) 9418604
27.7%

collectionID
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-14T11:33:52.400803image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length44
Median length44
Mean length44
Min length44

Characters and Unicode

Total characters31878352
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowurn:uuid:ce595e88-ceba-42c0-a3ff-cd55b694fac
2nd rowurn:uuid:ce595e88-ceba-42c0-a3ff-cd55b694fac
3rd rowurn:uuid:ce595e88-ceba-42c0-a3ff-cd55b694fac
4th rowurn:uuid:ce595e88-ceba-42c0-a3ff-cd55b694fac
5th rowurn:uuid:ce595e88-ceba-42c0-a3ff-cd55b694fac
ValueCountFrequency (%)
urn:uuid:ce595e88-ceba-42c0-a3ff-cd55b694fac 724508
100.0%
2025-01-14T11:33:52.512168image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
c 3622540
 
11.4%
- 2898032
 
9.1%
5 2898032
 
9.1%
u 2173524
 
6.8%
f 2173524
 
6.8%
a 2173524
 
6.8%
e 2173524
 
6.8%
4 1449016
 
4.5%
b 1449016
 
4.5%
8 1449016
 
4.5%
Other values (10) 9418604
29.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 17388192
54.5%
Decimal Number 10143112
31.8%
Dash Punctuation 2898032
 
9.1%
Other Punctuation 1449016
 
4.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
c 3622540
20.8%
u 2173524
12.5%
f 2173524
12.5%
a 2173524
12.5%
e 2173524
12.5%
b 1449016
 
8.3%
d 1449016
 
8.3%
r 724508
 
4.2%
i 724508
 
4.2%
n 724508
 
4.2%
Decimal Number
ValueCountFrequency (%)
5 2898032
28.6%
4 1449016
14.3%
8 1449016
14.3%
9 1449016
14.3%
2 724508
 
7.1%
0 724508
 
7.1%
3 724508
 
7.1%
6 724508
 
7.1%
Dash Punctuation
ValueCountFrequency (%)
- 2898032
100.0%
Other Punctuation
ValueCountFrequency (%)
: 1449016
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 17388192
54.5%
Common 14490160
45.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
c 3622540
20.8%
u 2173524
12.5%
f 2173524
12.5%
a 2173524
12.5%
e 2173524
12.5%
b 1449016
 
8.3%
d 1449016
 
8.3%
r 724508
 
4.2%
i 724508
 
4.2%
n 724508
 
4.2%
Common
ValueCountFrequency (%)
- 2898032
20.0%
5 2898032
20.0%
4 1449016
10.0%
8 1449016
10.0%
9 1449016
10.0%
: 1449016
10.0%
2 724508
 
5.0%
0 724508
 
5.0%
3 724508
 
5.0%
6 724508
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 31878352
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
c 3622540
 
11.4%
- 2898032
 
9.1%
5 2898032
 
9.1%
u 2173524
 
6.8%
f 2173524
 
6.8%
a 2173524
 
6.8%
e 2173524
 
6.8%
4 1449016
 
4.5%
b 1449016
 
4.5%
8 1449016
 
4.5%
Other values (10) 9418604
29.5%

institutionCode
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-14T11:33:52.553264image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters2898032
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUSNM
2nd rowUSNM
3rd rowUSNM
4th rowUSNM
5th rowUSNM
ValueCountFrequency (%)
usnm 724508
100.0%
2025-01-14T11:33:52.648299image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
U 724508
25.0%
S 724508
25.0%
N 724508
25.0%
M 724508
25.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2898032
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U 724508
25.0%
S 724508
25.0%
N 724508
25.0%
M 724508
25.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2898032
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
U 724508
25.0%
S 724508
25.0%
N 724508
25.0%
M 724508
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2898032
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
U 724508
25.0%
S 724508
25.0%
N 724508
25.0%
M 724508
25.0%

collectionCode
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-14T11:33:52.690544image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2173524
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPAL
2nd rowPAL
3rd rowPAL
4th rowPAL
5th rowPAL
ValueCountFrequency (%)
pal 724508
100.0%
2025-01-14T11:33:52.787303image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
P 724508
33.3%
A 724508
33.3%
L 724508
33.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2173524
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
P 724508
33.3%
A 724508
33.3%
L 724508
33.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 2173524
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
P 724508
33.3%
A 724508
33.3%
L 724508
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2173524
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
P 724508
33.3%
A 724508
33.3%
L 724508
33.3%

datasetName
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-14T11:33:52.833413image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length24
Median length24
Mean length24
Min length24

Characters and Unicode

Total characters17388192
Distinct characters17
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNMNH Paleobiology (USNM)
2nd rowNMNH Paleobiology (USNM)
3rd rowNMNH Paleobiology (USNM)
4th rowNMNH Paleobiology (USNM)
5th rowNMNH Paleobiology (USNM)
ValueCountFrequency (%)
nmnh 724508
33.3%
paleobiology 724508
33.3%
usnm 724508
33.3%
2025-01-14T11:33:52.938492image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
N 2173524
12.5%
o 2173524
12.5%
1449016
 
8.3%
l 1449016
 
8.3%
M 1449016
 
8.3%
H 724508
 
4.2%
P 724508
 
4.2%
a 724508
 
4.2%
e 724508
 
4.2%
b 724508
 
4.2%
Other values (7) 5071556
29.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7969588
45.8%
Uppercase Letter 6520572
37.5%
Space Separator 1449016
 
8.3%
Open Punctuation 724508
 
4.2%
Close Punctuation 724508
 
4.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 2173524
27.3%
l 1449016
18.2%
a 724508
 
9.1%
e 724508
 
9.1%
b 724508
 
9.1%
i 724508
 
9.1%
g 724508
 
9.1%
y 724508
 
9.1%
Uppercase Letter
ValueCountFrequency (%)
N 2173524
33.3%
M 1449016
22.2%
H 724508
 
11.1%
P 724508
 
11.1%
U 724508
 
11.1%
S 724508
 
11.1%
Space Separator
ValueCountFrequency (%)
1449016
100.0%
Open Punctuation
ValueCountFrequency (%)
( 724508
100.0%
Close Punctuation
ValueCountFrequency (%)
) 724508
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 14490160
83.3%
Common 2898032
 
16.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 2173524
15.0%
o 2173524
15.0%
l 1449016
10.0%
M 1449016
10.0%
H 724508
 
5.0%
P 724508
 
5.0%
a 724508
 
5.0%
e 724508
 
5.0%
b 724508
 
5.0%
i 724508
 
5.0%
Other values (4) 2898032
20.0%
Common
ValueCountFrequency (%)
1449016
50.0%
( 724508
25.0%
) 724508
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 17388192
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 2173524
12.5%
o 2173524
12.5%
1449016
 
8.3%
l 1449016
 
8.3%
M 1449016
 
8.3%
H 724508
 
4.2%
P 724508
 
4.2%
a 724508
 
4.2%
e 724508
 
4.2%
b 724508
 
4.2%
Other values (7) 5071556
29.2%

basisOfRecord
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-14T11:33:52.985217image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length14
Median length14
Mean length14
Min length14

Characters and Unicode

Total characters10143112
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFossilSpecimen
2nd rowFossilSpecimen
3rd rowFossilSpecimen
4th rowFossilSpecimen
5th rowFossilSpecimen
ValueCountFrequency (%)
fossilspecimen 724508
100.0%
2025-01-14T11:33:53.088002image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
s 1449016
14.3%
i 1449016
14.3%
e 1449016
14.3%
F 724508
7.1%
o 724508
7.1%
l 724508
7.1%
S 724508
7.1%
p 724508
7.1%
c 724508
7.1%
m 724508
7.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 8694096
85.7%
Uppercase Letter 1449016
 
14.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 1449016
16.7%
i 1449016
16.7%
e 1449016
16.7%
o 724508
8.3%
l 724508
8.3%
p 724508
8.3%
c 724508
8.3%
m 724508
8.3%
n 724508
8.3%
Uppercase Letter
ValueCountFrequency (%)
F 724508
50.0%
S 724508
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 10143112
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 1449016
14.3%
i 1449016
14.3%
e 1449016
14.3%
F 724508
7.1%
o 724508
7.1%
l 724508
7.1%
S 724508
7.1%
p 724508
7.1%
c 724508
7.1%
m 724508
7.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10143112
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 1449016
14.3%
i 1449016
14.3%
e 1449016
14.3%
F 724508
7.1%
o 724508
7.1%
l 724508
7.1%
S 724508
7.1%
p 724508
7.1%
c 724508
7.1%
m 724508
7.1%

occurrenceID
Text

Unique 

Distinct724508
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-14T11:33:53.465486image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length63
Median length63
Mean length63
Min length63

Characters and Unicode

Total characters45644004
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique724508 ?
Unique (%)100.0%

Sample

1st rowhttp://n2t.net/ark:/65665/300009e1e-4f3e-4240-b198-9ea1352b28b5
2nd rowhttp://n2t.net/ark:/65665/30000a59d-34e5-42b6-837d-ad1b89b6b930
3rd rowhttp://n2t.net/ark:/65665/3000109b9-b6d6-4ca0-8f0c-ddde53458300
4th rowhttp://n2t.net/ark:/65665/30001bcd8-61d5-492a-ad56-f8131f24bdaa
5th rowhttp://n2t.net/ark:/65665/300020a6b-970f-4e44-adb4-6d605be80b0d
ValueCountFrequency (%)
http://n2t.net/ark:/65665/300009e1e-4f3e-4240-b198-9ea1352b28b5 1
 
< 0.1%
http://n2t.net/ark:/65665/3004266bd-f222-4227-9817-5905ac4cbc57 1
 
< 0.1%
http://n2t.net/ark:/65665/30011b937-0eb9-4c75-bea7-c27393598b76 1
 
< 0.1%
http://n2t.net/ark:/65665/3002cb891-3b1b-49d8-84ee-8558aba9bf13 1
 
< 0.1%
http://n2t.net/ark:/65665/3000a6387-0469-4278-8ac0-fb0ac6fd37d6 1
 
< 0.1%
http://n2t.net/ark:/65665/3000109b9-b6d6-4ca0-8f0c-ddde53458300 1
 
< 0.1%
http://n2t.net/ark:/65665/30001bcd8-61d5-492a-ad56-f8131f24bdaa 1
 
< 0.1%
http://n2t.net/ark:/65665/300020a6b-970f-4e44-adb4-6d605be80b0d 1
 
< 0.1%
http://n2t.net/ark:/65665/300045523-2307-4a34-b888-fb51510870ad 1
 
< 0.1%
http://n2t.net/ark:/65665/300045db2-681e-481a-836e-3643bf3debbf 1
 
< 0.1%
Other values (724498) 724498
> 99.9%
2025-01-14T11:33:53.950421image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 3622540
 
7.9%
6 3531516
 
7.7%
- 2898032
 
6.3%
t 2898032
 
6.3%
5 2808306
 
6.2%
a 2263386
 
5.0%
e 2084462
 
4.6%
2 2083197
 
4.6%
3 2083153
 
4.6%
4 2081137
 
4.6%
Other values (16) 19290243
42.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 19743301
43.3%
Lowercase Letter 17206607
37.7%
Other Punctuation 5796064
 
12.7%
Dash Punctuation 2898032
 
6.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 2898032
16.8%
a 2263386
13.2%
e 2084462
12.1%
b 1539404
8.9%
n 1449016
8.4%
c 1358538
7.9%
d 1358025
7.9%
f 1357712
7.9%
k 724508
 
4.2%
r 724508
 
4.2%
Other values (2) 1449016
8.4%
Decimal Number
ValueCountFrequency (%)
6 3531516
17.9%
5 2808306
14.2%
2 2083197
10.6%
3 2083153
10.6%
4 2081137
10.5%
8 1539173
7.8%
9 1539102
7.8%
0 1359375
 
6.9%
7 1359374
 
6.9%
1 1358968
 
6.9%
Other Punctuation
ValueCountFrequency (%)
/ 3622540
62.5%
: 1449016
 
25.0%
. 724508
 
12.5%
Dash Punctuation
ValueCountFrequency (%)
- 2898032
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 28437397
62.3%
Latin 17206607
37.7%

Most frequent character per script

Common
ValueCountFrequency (%)
/ 3622540
12.7%
6 3531516
12.4%
- 2898032
10.2%
5 2808306
9.9%
2 2083197
7.3%
3 2083153
7.3%
4 2081137
7.3%
8 1539173
 
5.4%
9 1539102
 
5.4%
: 1449016
 
5.1%
Other values (4) 4802225
16.9%
Latin
ValueCountFrequency (%)
t 2898032
16.8%
a 2263386
13.2%
e 2084462
12.1%
b 1539404
8.9%
n 1449016
8.4%
c 1358538
7.9%
d 1358025
7.9%
f 1357712
7.9%
k 724508
 
4.2%
r 724508
 
4.2%
Other values (2) 1449016
8.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 45644004
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 3622540
 
7.9%
6 3531516
 
7.7%
- 2898032
 
6.3%
t 2898032
 
6.3%
5 2808306
 
6.2%
a 2263386
 
5.0%
e 2084462
 
4.6%
2 2083197
 
4.6%
3 2083153
 
4.6%
4 2081137
 
4.6%
Other values (16) 19290243
42.3%

catalogNumber
Text

Missing 

Distinct655081
Distinct (%)97.2%
Missing50535
Missing (%)7.0%
Memory size5.5 MiB
2025-01-14T11:33:54.426240image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length21
Median length14
Mean length13.86868317
Min length7

Characters and Unicode

Total characters9347118
Distinct characters68
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique638257 ?
Unique (%)94.7%

Sample

1st rowUSNM SD38013 0000
2nd rowUSNM PAL706968
3rd rowUSNM PAL248638
4th rowUSNM PAL456768
5th rowUSNM PAL297724
ValueCountFrequency (%)
usnm 673973
47.8%
0000 59177
 
4.2%
0002 159
 
< 0.1%
0001 159
 
< 0.1%
0003 149
 
< 0.1%
0004 145
 
< 0.1%
0005 137
 
< 0.1%
0006 116
 
< 0.1%
0007 113
 
< 0.1%
0008 105
 
< 0.1%
Other values (652937) 674632
47.9%
2025-01-14T11:33:54.955690image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
S 742844
 
7.9%
734892
 
7.9%
M 712585
 
7.6%
N 674519
 
7.2%
U 674214
 
7.2%
0 557394
 
6.0%
P 521957
 
5.6%
A 511374
 
5.5%
L 497601
 
5.3%
1 444334
 
4.8%
Other values (58) 3275404
35.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 4546936
48.6%
Decimal Number 4063828
43.5%
Space Separator 734892
 
7.9%
Other Punctuation 741
 
< 0.1%
Lowercase Letter 690
 
< 0.1%
Dash Punctuation 30
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 742844
16.3%
M 712585
15.7%
N 674519
14.8%
U 674214
14.8%
P 521957
11.5%
A 511374
11.2%
L 497601
10.9%
D 65264
 
1.4%
C 43992
 
1.0%
O 38427
 
0.8%
Other values (16) 64159
 
1.4%
Lowercase Letter
ValueCountFrequency (%)
a 130
18.8%
b 126
18.3%
d 61
8.8%
e 54
7.8%
c 50
 
7.2%
o 38
 
5.5%
l 31
 
4.5%
f 27
 
3.9%
r 26
 
3.8%
k 23
 
3.3%
Other values (16) 124
18.0%
Decimal Number
ValueCountFrequency (%)
0 557394
13.7%
1 444334
10.9%
3 432709
10.6%
5 423320
10.4%
2 419515
10.3%
4 412173
10.1%
6 395612
9.7%
7 350867
8.6%
8 318934
7.8%
9 308970
7.6%
Other Punctuation
ValueCountFrequency (%)
' 704
95.0%
" 35
 
4.7%
, 2
 
0.3%
Space Separator
ValueCountFrequency (%)
734892
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 30
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4799492
51.3%
Latin 4547626
48.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 742844
16.3%
M 712585
15.7%
N 674519
14.8%
U 674214
14.8%
P 521957
11.5%
A 511374
11.2%
L 497601
10.9%
D 65264
 
1.4%
C 43992
 
1.0%
O 38427
 
0.8%
Other values (42) 64849
 
1.4%
Common
ValueCountFrequency (%)
734892
15.3%
0 557394
11.6%
1 444334
9.3%
3 432709
9.0%
5 423320
8.8%
2 419515
8.7%
4 412173
8.6%
6 395612
8.2%
7 350867
7.3%
8 318934
6.6%
Other values (6) 309742
6.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9347118
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 742844
 
7.9%
734892
 
7.9%
M 712585
 
7.6%
N 674519
 
7.2%
U 674214
 
7.2%
0 557394
 
6.0%
P 521957
 
5.6%
A 511374
 
5.5%
L 497601
 
5.3%
1 444334
 
4.8%
Other values (58) 3275404
35.0%

recordNumber
Text

Missing 

Distinct39872
Distinct (%)82.1%
Missing675939
Missing (%)93.3%
Memory size5.5 MiB
2025-01-14T11:33:55.146751image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length48
Median length5
Mean length6.205336737
Min length1

Characters and Unicode

Total characters301387
Distinct characters77
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique37721 ?
Unique (%)77.7%

Sample

1st rowPALMER LOC 1479
2nd row75432
3rd rowH-11
4th rowE73-59
5th rowGaxin Loc 178-36
ValueCountFrequency (%)
loc 1685
 
2.9%
emlong 951
 
1.7%
urbac 803
 
1.4%
olson 263
 
0.5%
sample 209
 
0.4%
hass 177
 
0.3%
rb 171
 
0.3%
c-29 169
 
0.3%
gibson 163
 
0.3%
wyo 162
 
0.3%
Other values (38506) 52476
91.7%
2025-01-14T11:33:55.413453image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 30021
 
10.0%
5 27939
 
9.3%
7 23690
 
7.9%
2 21570
 
7.2%
3 20657
 
6.9%
6 18998
 
6.3%
8 18791
 
6.2%
0 17388
 
5.8%
4 17006
 
5.6%
- 16559
 
5.5%
Other values (67) 88768
29.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 211386
70.1%
Uppercase Letter 58763
 
19.5%
Dash Punctuation 16559
 
5.5%
Space Separator 8660
 
2.9%
Other Punctuation 3199
 
1.1%
Lowercase Letter 2471
 
0.8%
Math Symbol 145
 
< 0.1%
Close Punctuation 102
 
< 0.1%
Open Punctuation 101
 
< 0.1%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
O 5593
 
9.5%
E 4986
 
8.5%
L 4981
 
8.5%
C 4891
 
8.3%
S 4262
 
7.3%
A 4151
 
7.1%
M 3190
 
5.4%
R 3078
 
5.2%
N 3020
 
5.1%
B 2373
 
4.0%
Other values (16) 18238
31.0%
Lowercase Letter
ValueCountFrequency (%)
o 425
17.2%
n 315
12.7%
a 217
8.8%
y 190
7.7%
l 189
7.6%
c 189
7.6%
e 172
7.0%
i 169
 
6.8%
r 167
 
6.8%
t 82
 
3.3%
Other values (14) 356
14.4%
Decimal Number
ValueCountFrequency (%)
1 30021
14.2%
5 27939
13.2%
7 23690
11.2%
2 21570
10.2%
3 20657
9.8%
6 18998
9.0%
8 18791
8.9%
0 17388
8.2%
4 17006
8.0%
9 15326
7.3%
Other Punctuation
ValueCountFrequency (%)
/ 1630
51.0%
. 955
29.9%
, 516
 
16.1%
? 56
 
1.8%
' 22
 
0.7%
; 12
 
0.4%
# 5
 
0.2%
: 2
 
0.1%
& 1
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
+ 135
93.1%
= 10
 
6.9%
Close Punctuation
ValueCountFrequency (%)
) 100
98.0%
} 2
 
2.0%
Dash Punctuation
ValueCountFrequency (%)
- 16559
100.0%
Space Separator
ValueCountFrequency (%)
8660
100.0%
Open Punctuation
ValueCountFrequency (%)
( 101
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 240153
79.7%
Latin 61234
 
20.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
O 5593
 
9.1%
E 4986
 
8.1%
L 4981
 
8.1%
C 4891
 
8.0%
S 4262
 
7.0%
A 4151
 
6.8%
M 3190
 
5.2%
R 3078
 
5.0%
N 3020
 
4.9%
B 2373
 
3.9%
Other values (40) 20709
33.8%
Common
ValueCountFrequency (%)
1 30021
12.5%
5 27939
11.6%
7 23690
9.9%
2 21570
9.0%
3 20657
8.6%
6 18998
7.9%
8 18791
7.8%
0 17388
7.2%
4 17006
7.1%
- 16559
6.9%
Other values (17) 27534
11.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 301387
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 30021
 
10.0%
5 27939
 
9.3%
7 23690
 
7.9%
2 21570
 
7.2%
3 20657
 
6.9%
6 18998
 
6.3%
8 18791
 
6.2%
0 17388
 
5.8%
4 17006
 
5.6%
- 16559
 
5.5%
Other values (67) 88768
29.5%

recordedBy
Text

Missing 

Distinct3957
Distinct (%)2.5%
Missing563497
Missing (%)77.8%
Memory size5.5 MiB
2025-01-14T11:33:55.602423image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length119
Median length61
Mean length10.93147052
Min length1

Characters and Unicode

Total characters1760087
Distinct characters61
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1329 ?
Unique (%)0.8%

Sample

1st rowR. Snow
2nd rowD. Palmer
3rd rowW. Woodring & L. Lupher
4th rowJames
5th rowRoss
ValueCountFrequency (%)
21228
 
6.1%
j 19727
 
5.7%
r 15376
 
4.5%
w 14249
 
4.1%
a 12060
 
3.5%
james 11468
 
3.3%
l 10757
 
3.1%
woodring 9356
 
2.7%
pribyl 8943
 
2.6%
c 7362
 
2.1%
Other values (2560) 214833
62.2%
2025-01-14T11:33:55.869926image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
184348
 
10.5%
e 133592
 
7.6%
. 131492
 
7.5%
r 102132
 
5.8%
o 91217
 
5.2%
l 89319
 
5.1%
n 89079
 
5.1%
a 84651
 
4.8%
i 80231
 
4.6%
s 70452
 
4.0%
Other values (51) 703574
40.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1075097
61.1%
Uppercase Letter 337569
 
19.2%
Space Separator 184348
 
10.5%
Other Punctuation 160539
 
9.1%
Dash Punctuation 2462
 
0.1%
Open Punctuation 36
 
< 0.1%
Close Punctuation 36
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 133592
12.4%
r 102132
9.5%
o 91217
 
8.5%
l 89319
 
8.3%
n 89079
 
8.3%
a 84651
 
7.9%
i 80231
 
7.5%
s 70452
 
6.6%
t 48464
 
4.5%
d 48173
 
4.5%
Other values (18) 237787
22.1%
Uppercase Letter
ValueCountFrequency (%)
J 36000
 
10.7%
W 33626
 
10.0%
A 27177
 
8.1%
R 24357
 
7.2%
P 20822
 
6.2%
C 20595
 
6.1%
M 19813
 
5.9%
S 19479
 
5.8%
L 18797
 
5.6%
H 15162
 
4.5%
Other values (15) 101741
30.1%
Other Punctuation
ValueCountFrequency (%)
. 131492
81.9%
& 21228
 
13.2%
, 7789
 
4.9%
' 30
 
< 0.1%
Space Separator
ValueCountFrequency (%)
184348
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2462
100.0%
Open Punctuation
ValueCountFrequency (%)
( 36
100.0%
Close Punctuation
ValueCountFrequency (%)
) 36
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1412666
80.3%
Common 347421
 
19.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 133592
 
9.5%
r 102132
 
7.2%
o 91217
 
6.5%
l 89319
 
6.3%
n 89079
 
6.3%
a 84651
 
6.0%
i 80231
 
5.7%
s 70452
 
5.0%
t 48464
 
3.4%
d 48173
 
3.4%
Other values (43) 575356
40.7%
Common
ValueCountFrequency (%)
184348
53.1%
. 131492
37.8%
& 21228
 
6.1%
, 7789
 
2.2%
- 2462
 
0.7%
( 36
 
< 0.1%
) 36
 
< 0.1%
' 30
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1760046
> 99.9%
None 41
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
184348
 
10.5%
e 133592
 
7.6%
. 131492
 
7.5%
r 102132
 
5.8%
o 91217
 
5.2%
l 89319
 
5.1%
n 89079
 
5.1%
a 84651
 
4.8%
i 80231
 
4.6%
s 70452
 
4.0%
Other values (49) 703533
40.0%
None
ValueCountFrequency (%)
ú 40
97.6%
č 1
 
2.4%
Distinct686
Distinct (%)0.1%
Missing303
Missing (%)< 0.1%
Memory size5.5 MiB
2025-01-14T11:33:56.045318image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length5
Median length1
Mean length1.088909908
Min length1

Characters and Unicode

Total characters788594
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique253 ?
Unique (%)< 0.1%

Sample

1st row1
2nd row1
3rd row1
4th row25
5th row1
ValueCountFrequency (%)
1 594864
82.1%
2 29629
 
4.1%
3 14673
 
2.0%
4 9858
 
1.4%
5 7420
 
1.0%
6 5780
 
0.8%
7 4510
 
0.6%
8 3695
 
0.5%
10 3151
 
0.4%
9 3129
 
0.4%
Other values (676) 47496
 
6.6%
2025-01-14T11:33:56.278085image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 624602
79.2%
2 43921
 
5.6%
0 28217
 
3.6%
3 23988
 
3.0%
5 17293
 
2.2%
4 17104
 
2.2%
6 10762
 
1.4%
7 9146
 
1.2%
8 7494
 
1.0%
9 6067
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 788594
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 624602
79.2%
2 43921
 
5.6%
0 28217
 
3.6%
3 23988
 
3.0%
5 17293
 
2.2%
4 17104
 
2.2%
6 10762
 
1.4%
7 9146
 
1.2%
8 7494
 
1.0%
9 6067
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Common 788594
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 624602
79.2%
2 43921
 
5.6%
0 28217
 
3.6%
3 23988
 
3.0%
5 17293
 
2.2%
4 17104
 
2.2%
6 10762
 
1.4%
7 9146
 
1.2%
8 7494
 
1.0%
9 6067
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 788594
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 624602
79.2%
2 43921
 
5.6%
0 28217
 
3.6%
3 23988
 
3.0%
5 17293
 
2.2%
4 17104
 
2.2%
6 10762
 
1.4%
7 9146
 
1.2%
8 7494
 
1.0%
9 6067
 
0.8%

preparations
Text

Missing 

Distinct381
Distinct (%)0.3%
Missing591600
Missing (%)81.7%
Memory size5.5 MiB
2025-01-14T11:33:56.354170image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length94
Median length91
Mean length16.14684594
Min length3

Characters and Unicode

Total characters2146045
Distinct characters51
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique130 ?
Unique (%)0.1%

Sample

1st rowBoxes and vials
2nd rowThin sections
3rd rowSecondary microslides
4th rowWet
5th rowplastic container
ValueCountFrequency (%)
microslide 45697
17.5%
microslides 34837
13.4%
secondary 33230
12.8%
remnants 26629
10.2%
thin 24547
9.4%
sections 24011
9.2%
no 15071
 
5.8%
with 10919
 
4.2%
unsectioned 9109
 
3.5%
bottle 3934
 
1.5%
Other values (53) 32636
12.5%
2025-01-14T11:33:56.504063image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 236706
11.0%
s 211809
9.9%
e 210870
9.8%
n 172401
 
8.0%
o 167894
 
7.8%
c 147453
 
6.9%
r 146905
 
6.8%
d 130804
 
6.1%
127712
 
6.0%
l 92477
 
4.3%
Other values (41) 501014
23.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1849130
86.2%
Uppercase Letter 159097
 
7.4%
Space Separator 127712
 
6.0%
Other Punctuation 10096
 
0.5%
Open Punctuation 5
 
< 0.1%
Close Punctuation 5
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 236706
12.8%
s 211809
11.5%
e 210870
11.4%
n 172401
9.3%
o 167894
9.1%
c 147453
8.0%
r 146905
7.9%
d 130804
7.1%
l 92477
 
5.0%
t 85481
 
4.6%
Other values (14) 246330
13.3%
Uppercase Letter
ValueCountFrequency (%)
M 46146
29.0%
S 38065
23.9%
T 27401
17.2%
U 10261
 
6.4%
B 6095
 
3.8%
P 5926
 
3.7%
C 5880
 
3.7%
O 5094
 
3.2%
E 3082
 
1.9%
R 2197
 
1.4%
Other values (11) 8950
 
5.6%
Other Punctuation
ValueCountFrequency (%)
; 9850
97.6%
& 157
 
1.6%
/ 89
 
0.9%
Space Separator
ValueCountFrequency (%)
127712
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2008227
93.6%
Common 137818
 
6.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 236706
11.8%
s 211809
10.5%
e 210870
10.5%
n 172401
8.6%
o 167894
8.4%
c 147453
 
7.3%
r 146905
 
7.3%
d 130804
 
6.5%
l 92477
 
4.6%
t 85481
 
4.3%
Other values (35) 405427
20.2%
Common
ValueCountFrequency (%)
127712
92.7%
; 9850
 
7.1%
& 157
 
0.1%
/ 89
 
0.1%
( 5
 
< 0.1%
) 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2146045
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 236706
11.0%
s 211809
9.9%
e 210870
9.8%
n 172401
 
8.0%
o 167894
 
7.8%
c 147453
 
6.9%
r 146905
 
6.8%
d 130804
 
6.1%
127712
 
6.0%
l 92477
 
4.3%
Other values (41) 501014
23.3%

associatedMedia
Text

Missing 

Distinct84848
Distinct (%)97.2%
Missing637195
Missing (%)87.9%
Memory size5.5 MiB
2025-01-14T11:33:56.683802image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length1069
Median length1059
Mean length58.46043544
Min length48

Characters and Unicode

Total characters5104356
Distinct characters31
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique83728 ?
Unique (%)95.9%

Sample

1st rowhttps://collections.nmnh.si.edu/media/?i=12688993
2nd rowhttps://collections.nmnh.si.edu/media/?i=12689748
3rd rowhttps://collections.nmnh.si.edu/media/?i=15308925
4th rowhttps://collections.nmnh.si.edu/media/?i=11098487
5th rowhttps://collections.nmnh.si.edu/media/?i=12770417; 12770964
ValueCountFrequency (%)
https://collections.nmnh.si.edu/media/?i=16189563 203
 
0.1%
https://collections.nmnh.si.edu/media/?i=16053361 170
 
0.1%
10035032 87
 
0.1%
https://collections.nmnh.si.edu/media/?i=13958963 76
 
< 0.1%
https://collections.nmnh.si.edu/media/?i=16647294 48
 
< 0.1%
https://collections.nmnh.si.edu/media/?i=16725276 37
 
< 0.1%
https://collections.nmnh.si.edu/media/?i=16115280 33
 
< 0.1%
10320533 30
 
< 0.1%
10320530 29
 
< 0.1%
10320532 26
 
< 0.1%
Other values (167678) 170293
99.6%
2025-01-14T11:33:57.042011image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 349252
 
6.8%
/ 349252
 
6.8%
n 261939
 
5.1%
s 261939
 
5.1%
t 261939
 
5.1%
. 261939
 
5.1%
e 261939
 
5.1%
1 256693
 
5.0%
d 174626
 
3.4%
m 174626
 
3.4%
Other values (21) 2490212
48.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2706703
53.0%
Decimal Number 1357085
26.6%
Other Punctuation 869536
 
17.0%
Math Symbol 87313
 
1.7%
Space Separator 83719
 
1.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 349252
12.9%
n 261939
9.7%
s 261939
9.7%
t 261939
9.7%
e 261939
9.7%
d 174626
 
6.5%
m 174626
 
6.5%
h 174626
 
6.5%
l 174626
 
6.5%
o 174626
 
6.5%
Other values (4) 436565
16.1%
Decimal Number
ValueCountFrequency (%)
1 256693
18.9%
2 156668
11.5%
8 152923
11.3%
0 142520
10.5%
7 132916
9.8%
4 117629
8.7%
6 106828
7.9%
3 103174
7.6%
9 97272
 
7.2%
5 90462
 
6.7%
Other Punctuation
ValueCountFrequency (%)
/ 349252
40.2%
. 261939
30.1%
? 87313
 
10.0%
: 87313
 
10.0%
; 83719
 
9.6%
Math Symbol
ValueCountFrequency (%)
= 87313
100.0%
Space Separator
ValueCountFrequency (%)
83719
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2706703
53.0%
Common 2397653
47.0%

Most frequent character per script

Common
ValueCountFrequency (%)
/ 349252
14.6%
. 261939
10.9%
1 256693
10.7%
2 156668
 
6.5%
8 152923
 
6.4%
0 142520
 
5.9%
7 132916
 
5.5%
4 117629
 
4.9%
6 106828
 
4.5%
3 103174
 
4.3%
Other values (7) 617111
25.7%
Latin
ValueCountFrequency (%)
i 349252
12.9%
n 261939
9.7%
s 261939
9.7%
t 261939
9.7%
e 261939
9.7%
d 174626
 
6.5%
m 174626
 
6.5%
h 174626
 
6.5%
l 174626
 
6.5%
o 174626
 
6.5%
Other values (4) 436565
16.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5104356
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 349252
 
6.8%
/ 349252
 
6.8%
n 261939
 
5.1%
s 261939
 
5.1%
t 261939
 
5.1%
. 261939
 
5.1%
e 261939
 
5.1%
1 256693
 
5.0%
d 174626
 
3.4%
m 174626
 
3.4%
Other values (21) 2490212
48.8%

occurrenceRemarks
Text

Missing 

Distinct38195
Distinct (%)44.3%
Missing638259
Missing (%)88.1%
Memory size5.5 MiB
2025-01-14T11:33:57.251508image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length1257
Median length1240
Mean length357.4557966
Min length5

Characters and Unicode

Total characters30830205
Distinct characters92
Distinct categories13 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique36384 ?
Unique (%)42.2%

Sample

1st rowSpecimen comments: Associated w/ #0343 and #0346. | Body size code: medium; Taphonomic Significance: Human modification | Features: Weathering, diagenesis: N/A; Burn Color: none; Burn Modification: none; Cut: 0; Scrape: 0; Chop: 0; Loading Notch: 0; Counterblow: 0; Anvil pit: 0; Carn pit: 0; Carn score: 0; Carn furrow: 0; Carn punct: 0; Carn crenulation: 0; Rodent gnaw: none
2nd rowEMu record was created as part of the Smithsonian Institution Digitization Program Office (SI DPO) mass digitization pilot project to support the National Science Foundation Advancing Digitization of Biodiversity Collections Eastern Pacific Invertebrates of the Cenozoic Collaborative Thematic Collections Network (NSF ADBC EPICC TCN). The SI DPO mass digitization pilot workflow includes crowdsourced label transcription through the SI Transcription Center.; Information generated by NMNH Department of Paleobiology volunteers: Specimen count and preliminary identification to class.
3rd rowEMu record was created as part of the Smithsonian Institution Digitization Program Office (SI DPO) mass digitization pilot project to support the National Science Foundation Advancing Digitization of Biodiversity Collections Eastern Pacific Invertebrates of the Cenozoic Collaborative Thematic Collections Network (NSF ADBC EPICC TCN). The SI DPO mass digitization pilot workflow includes crowdsourced label transcription through the SI Transcription Center.; Information generated by NMNH Department of Paleobiology volunteers: Specimen count and preliminary identification to class.
4th rowThe fossil is marked with the original Green River number and is often mistaken for the USNM number. That original Green River collection number is 75432.; Numbers associated with this fossil: 578683. 75432. 40193.
5th rowEMu record was created as part of the Smithsonian Institution Digitization Program Office (SI DPO) mass digitization pilot project to support the National Science Foundation Advancing Digitization of Biodiversity Collections Eastern Pacific Invertebrates of the Cenozoic Collaborative Thematic Collections Network (NSF ADBC EPICC TCN). The SI DPO mass digitization pilot workflow includes crowdsourced label transcription through the SI Transcription Center.; Additional label information: This locality is at approximately the same horizon as USGS CENO LOC 5686, in which a shale fauna was collected | See USGS CENO LOC 5703; Verbatim Lithostratigraphy: Tejon Formation; Sandstone forming the upper member of the Tejon | Discontinuous lenses in a soft brownish sandstone, less than 100 feet stratigraphically below the overlying diatomaceous shale; Verbatim Chronostratigraphy: Eocene
ValueCountFrequency (%)
the 291111
 
6.9%
digitization 174338
 
4.1%
of 164357
 
3.9%
si 100203
 
2.4%
collections 99405
 
2.4%
number 86263
 
2.0%
is 85833
 
2.0%
mass 74949
 
1.8%
dpo 74947
 
1.8%
with 57325
 
1.4%
Other values (66970) 3009589
71.3%
2025-01-14T11:33:57.541907image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4132071
 
13.4%
i 2608470
 
8.5%
t 2311910
 
7.5%
o 2139574
 
6.9%
e 2129723
 
6.9%
n 1708168
 
5.5%
a 1671073
 
5.4%
r 1554155
 
5.0%
s 1249854
 
4.1%
c 981043
 
3.2%
Other values (82) 10344164
33.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 22179429
71.9%
Space Separator 4132071
 
13.4%
Uppercase Letter 3027854
 
9.8%
Decimal Number 712264
 
2.3%
Other Punctuation 536260
 
1.7%
Open Punctuation 103223
 
0.3%
Close Punctuation 103221
 
0.3%
Math Symbol 26815
 
0.1%
Dash Punctuation 8726
 
< 0.1%
Connector Punctuation 335
 
< 0.1%
Other values (3) 7
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 2608470
11.8%
t 2311910
10.4%
o 2139574
9.6%
e 2129723
9.6%
n 1708168
 
7.7%
a 1671073
 
7.5%
r 1554155
 
7.0%
s 1249854
 
5.6%
c 981043
 
4.4%
l 809850
 
3.7%
Other values (16) 5015609
22.6%
Uppercase Letter
ValueCountFrequency (%)
C 475177
15.7%
S 312569
10.3%
N 284886
9.4%
I 260808
8.6%
P 248493
8.2%
D 239558
7.9%
T 217566
 
7.2%
E 157599
 
5.2%
A 134747
 
4.5%
O 129263
 
4.3%
Other values (16) 567188
18.7%
Other Punctuation
ValueCountFrequency (%)
. 253963
47.4%
: 134709
25.1%
; 123326
23.0%
, 10668
 
2.0%
/ 5315
 
1.0%
& 3632
 
0.7%
? 1748
 
0.3%
" 1387
 
0.3%
# 984
 
0.2%
' 412
 
0.1%
Other values (5) 116
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 96673
13.6%
5 95617
13.4%
0 89759
12.6%
4 70754
9.9%
2 67002
9.4%
7 66254
9.3%
8 64489
9.1%
6 57819
8.1%
3 52279
7.3%
9 51618
7.2%
Math Symbol
ValueCountFrequency (%)
| 24725
92.2%
+ 1585
 
5.9%
> 212
 
0.8%
< 199
 
0.7%
= 94
 
0.4%
Open Punctuation
ValueCountFrequency (%)
( 103206
> 99.9%
[ 17
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 103204
> 99.9%
] 17
 
< 0.1%
Space Separator
ValueCountFrequency (%)
4132071
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 8726
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 335
100.0%
Initial Punctuation
ValueCountFrequency (%)
4
100.0%
Final Punctuation
ValueCountFrequency (%)
2
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 25207283
81.8%
Common 5622922
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 2608470
 
10.3%
t 2311910
 
9.2%
o 2139574
 
8.5%
e 2129723
 
8.4%
n 1708168
 
6.8%
a 1671073
 
6.6%
r 1554155
 
6.2%
s 1249854
 
5.0%
c 981043
 
3.9%
l 809850
 
3.2%
Other values (42) 8043463
31.9%
Common
ValueCountFrequency (%)
4132071
73.5%
. 253963
 
4.5%
: 134709
 
2.4%
; 123326
 
2.2%
( 103206
 
1.8%
) 103204
 
1.8%
1 96673
 
1.7%
5 95617
 
1.7%
0 89759
 
1.6%
4 70754
 
1.3%
Other values (30) 419640
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 30830198
> 99.9%
Punctuation 7
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4132071
 
13.4%
i 2608470
 
8.5%
t 2311910
 
7.5%
o 2139574
 
6.9%
e 2129723
 
6.9%
n 1708168
 
5.5%
a 1671073
 
5.4%
r 1554155
 
5.0%
s 1249854
 
4.1%
c 981043
 
3.2%
Other values (79) 10344157
33.6%
Punctuation
ValueCountFrequency (%)
4
57.1%
2
28.6%
1
 
14.3%

fieldNumber
Text

Missing 

Distinct1516
Distinct (%)34.0%
Missing720044
Missing (%)99.4%
Memory size5.5 MiB
2025-01-14T11:33:57.740368image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length209
Median length45
Mean length35.25537634
Min length1

Characters and Unicode

Total characters157380
Distinct characters72
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1229 ?
Unique (%)27.5%

Sample

1st rowMTC-08009; MTC-08009B; MTC-08009B (A); MTC-08009B (B)
2nd row217
3rd rowYP79-2
4th rowTDP31
5th row82-10; 82-19; 82-21; 82-22; 82-4; 82-6; 82-7
ValueCountFrequency (%)
82-10 767
 
4.2%
82-21 767
 
4.2%
82-22 767
 
4.2%
82-4 767
 
4.2%
82-6 767
 
4.2%
82-7 767
 
4.2%
82-19 767
 
4.2%
mtc-04028dd 329
 
1.8%
mtc-04028h 329
 
1.8%
mtc-04028gg 329
 
1.8%
Other values (1502) 11759
64.9%
2025-01-14T11:33:58.000639image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18832
12.0%
- 15944
10.1%
2 14513
9.2%
13651
 
8.7%
; 12694
 
8.1%
8 11928
 
7.6%
C 9870
 
6.3%
M 9201
 
5.8%
T 8674
 
5.5%
4 7381
 
4.7%
Other values (62) 34692
22.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 72021
45.8%
Uppercase Letter 40992
26.0%
Dash Punctuation 15944
 
10.1%
Space Separator 13651
 
8.7%
Other Punctuation 12856
 
8.2%
Lowercase Letter 1716
 
1.1%
Close Punctuation 100
 
0.1%
Open Punctuation 100
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
c 290
16.9%
a 205
11.9%
m 201
11.7%
e 185
10.8%
l 159
9.3%
p 150
8.7%
o 130
7.6%
t 77
 
4.5%
r 70
 
4.1%
i 55
 
3.2%
Other values (16) 194
11.3%
Uppercase Letter
ValueCountFrequency (%)
C 9870
24.1%
M 9201
22.4%
T 8674
21.2%
A 1535
 
3.7%
G 1513
 
3.7%
B 1509
 
3.7%
E 1291
 
3.1%
D 1285
 
3.1%
F 1161
 
2.8%
H 1137
 
2.8%
Other values (15) 3816
 
9.3%
Decimal Number
ValueCountFrequency (%)
0 18832
26.1%
2 14513
20.2%
8 11928
16.6%
4 7381
 
10.2%
1 6730
 
9.3%
3 3699
 
5.1%
5 3595
 
5.0%
7 2000
 
2.8%
9 1780
 
2.5%
6 1563
 
2.2%
Other Punctuation
ValueCountFrequency (%)
; 12694
98.7%
. 62
 
0.5%
, 49
 
0.4%
# 34
 
0.3%
/ 10
 
0.1%
& 4
 
< 0.1%
' 3
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 15944
100.0%
Space Separator
ValueCountFrequency (%)
13651
100.0%
Close Punctuation
ValueCountFrequency (%)
) 100
100.0%
Open Punctuation
ValueCountFrequency (%)
( 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 114672
72.9%
Latin 42708
 
27.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 9870
23.1%
M 9201
21.5%
T 8674
20.3%
A 1535
 
3.6%
G 1513
 
3.5%
B 1509
 
3.5%
E 1291
 
3.0%
D 1285
 
3.0%
F 1161
 
2.7%
H 1137
 
2.7%
Other values (41) 5532
13.0%
Common
ValueCountFrequency (%)
0 18832
16.4%
- 15944
13.9%
2 14513
12.7%
13651
11.9%
; 12694
11.1%
8 11928
10.4%
4 7381
 
6.4%
1 6730
 
5.9%
3 3699
 
3.2%
5 3595
 
3.1%
Other values (11) 5705
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 157380
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18832
12.0%
- 15944
10.1%
2 14513
9.2%
13651
 
8.7%
; 12694
 
8.1%
8 11928
 
7.6%
C 9870
 
6.3%
M 9201
 
5.8%
T 8674
 
5.5%
4 7381
 
4.7%
Other values (62) 34692
22.0%

eventDate
Text

Missing 

Distinct17617
Distinct (%)6.5%
Missing453741
Missing (%)62.6%
Memory size5.5 MiB
2025-01-14T11:33:58.195238image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length21
Median length18
Mean length7.649425521
Min length4

Characters and Unicode

Total characters2071212
Distinct characters13
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5897 ?
Unique (%)2.2%

Sample

1st row1985-01-23
2nd row1974
3rd row1980
4th row1963
5th row1956
ValueCountFrequency (%)
1910/1917 6616
 
2.4%
1991/1993 6310
 
2.3%
1999 3773
 
1.4%
1980 3739
 
1.4%
1982 3572
 
1.3%
1984-02 3350
 
1.2%
1998 3319
 
1.2%
1997 3308
 
1.2%
1995 3121
 
1.2%
2001 2926
 
1.1%
Other values (17607) 230733
85.2%
2025-01-14T11:33:58.457245image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 451090
21.8%
9 375304
18.1%
- 289583
14.0%
0 255834
12.4%
8 133815
 
6.5%
7 127284
 
6.1%
2 109700
 
5.3%
6 89305
 
4.3%
3 74141
 
3.6%
4 71285
 
3.4%
Other values (3) 93871
 
4.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1757750
84.9%
Dash Punctuation 289583
 
14.0%
Other Punctuation 23879
 
1.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 451090
25.7%
9 375304
21.4%
0 255834
14.6%
8 133815
 
7.6%
7 127284
 
7.2%
2 109700
 
6.2%
6 89305
 
5.1%
3 74141
 
4.2%
4 71285
 
4.1%
5 69992
 
4.0%
Other Punctuation
ValueCountFrequency (%)
/ 23877
> 99.9%
, 2
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 289583
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2071212
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 451090
21.8%
9 375304
18.1%
- 289583
14.0%
0 255834
12.4%
8 133815
 
6.5%
7 127284
 
6.1%
2 109700
 
5.3%
6 89305
 
4.3%
3 74141
 
3.6%
4 71285
 
3.4%
Other values (3) 93871
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2071212
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 451090
21.8%
9 375304
18.1%
- 289583
14.0%
0 255834
12.4%
8 133815
 
6.5%
7 127284
 
6.1%
2 109700
 
5.3%
6 89305
 
4.3%
3 74141
 
3.6%
4 71285
 
3.4%
Other values (3) 93871
 
4.5%

startDayOfYear
Text

Missing 

Distinct366
Distinct (%)0.2%
Missing571939
Missing (%)78.9%
Memory size5.5 MiB
2025-01-14T11:33:58.667397image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length3
Median length3
Mean length2.836395336
Min length1

Characters and Unicode

Total characters432746
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row23
2nd row267
3rd row230
4th row288
5th row100
ValueCountFrequency (%)
60 3645
 
2.4%
212 3066
 
2.0%
243 2888
 
1.9%
181 2290
 
1.5%
151 2068
 
1.4%
304 1900
 
1.2%
213 1765
 
1.2%
120 1640
 
1.1%
273 1383
 
0.9%
244 1217
 
0.8%
Other values (356) 130707
85.7%
2025-01-14T11:33:58.942890image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 95911
22.2%
1 86225
19.9%
3 48550
11.2%
0 34306
 
7.9%
4 30194
 
7.0%
9 29540
 
6.8%
6 28135
 
6.5%
5 27414
 
6.3%
8 26265
 
6.1%
7 26206
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 432746
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 95911
22.2%
1 86225
19.9%
3 48550
11.2%
0 34306
 
7.9%
4 30194
 
7.0%
9 29540
 
6.8%
6 28135
 
6.5%
5 27414
 
6.3%
8 26265
 
6.1%
7 26206
 
6.1%

Most occurring scripts

ValueCountFrequency (%)
Common 432746
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 95911
22.2%
1 86225
19.9%
3 48550
11.2%
0 34306
 
7.9%
4 30194
 
7.0%
9 29540
 
6.8%
6 28135
 
6.5%
5 27414
 
6.3%
8 26265
 
6.1%
7 26206
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 432746
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 95911
22.2%
1 86225
19.9%
3 48550
11.2%
0 34306
 
7.9%
4 30194
 
7.0%
9 29540
 
6.8%
6 28135
 
6.5%
5 27414
 
6.3%
8 26265
 
6.1%
7 26206
 
6.1%

endDayOfYear
Text

Missing 

Distinct366
Distinct (%)0.2%
Missing571953
Missing (%)78.9%
Memory size5.5 MiB
2025-01-14T11:33:59.149964image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length3
Median length3
Mean length2.837606109
Min length1

Characters and Unicode

Total characters432891
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row23
2nd row267
3rd row230
4th row288
5th row100
ValueCountFrequency (%)
60 3687
 
2.4%
243 3058
 
2.0%
212 2958
 
1.9%
151 2041
 
1.3%
181 2016
 
1.3%
304 1825
 
1.2%
120 1813
 
1.2%
213 1760
 
1.2%
273 1430
 
0.9%
244 1424
 
0.9%
Other values (356) 130543
85.6%
2025-01-14T11:33:59.420421image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 96077
22.2%
1 85473
19.7%
3 48226
11.1%
0 34296
 
7.9%
4 30948
 
7.1%
9 29109
 
6.7%
6 28569
 
6.6%
5 27645
 
6.4%
7 26568
 
6.1%
8 25980
 
6.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 432891
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 96077
22.2%
1 85473
19.7%
3 48226
11.1%
0 34296
 
7.9%
4 30948
 
7.1%
9 29109
 
6.7%
6 28569
 
6.6%
5 27645
 
6.4%
7 26568
 
6.1%
8 25980
 
6.0%

Most occurring scripts

ValueCountFrequency (%)
Common 432891
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 96077
22.2%
1 85473
19.7%
3 48226
11.1%
0 34296
 
7.9%
4 30948
 
7.1%
9 29109
 
6.7%
6 28569
 
6.6%
5 27645
 
6.4%
7 26568
 
6.1%
8 25980
 
6.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 432891
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 96077
22.2%
1 85473
19.7%
3 48226
11.1%
0 34296
 
7.9%
4 30948
 
7.1%
9 29109
 
6.7%
6 28569
 
6.6%
5 27645
 
6.4%
7 26568
 
6.1%
8 25980
 
6.0%

year
Text

Missing 

Distinct191
Distinct (%)0.1%
Missing453741
Missing (%)62.6%
Memory size5.5 MiB
2025-01-14T11:33:59.594481image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters1083068
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)< 0.1%

Sample

1st row1985
2nd row1974
3rd row1980
4th row1963
5th row1956
ValueCountFrequency (%)
1910 7846
 
2.9%
1991 7769
 
2.9%
1980 7431
 
2.7%
1981 7192
 
2.7%
1982 7174
 
2.6%
1971 6769
 
2.5%
1976 6488
 
2.4%
1964 5815
 
2.1%
1973 5778
 
2.1%
1984 5612
 
2.1%
Other values (181) 202893
74.9%
2025-01-14T11:33:59.819495image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 322145
29.7%
9 319500
29.5%
8 89146
 
8.2%
7 77505
 
7.2%
6 58473
 
5.4%
0 54161
 
5.0%
4 44737
 
4.1%
5 40639
 
3.8%
2 38510
 
3.6%
3 38252
 
3.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1083068
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 322145
29.7%
9 319500
29.5%
8 89146
 
8.2%
7 77505
 
7.2%
6 58473
 
5.4%
0 54161
 
5.0%
4 44737
 
4.1%
5 40639
 
3.8%
2 38510
 
3.6%
3 38252
 
3.5%

Most occurring scripts

ValueCountFrequency (%)
Common 1083068
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 322145
29.7%
9 319500
29.5%
8 89146
 
8.2%
7 77505
 
7.2%
6 58473
 
5.4%
0 54161
 
5.0%
4 44737
 
4.1%
5 40639
 
3.8%
2 38510
 
3.6%
3 38252
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1083068
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 322145
29.7%
9 319500
29.5%
8 89146
 
8.2%
7 77505
 
7.2%
6 58473
 
5.4%
0 54161
 
5.0%
4 44737
 
4.1%
5 40639
 
3.8%
2 38510
 
3.6%
3 38252
 
3.5%

month
Text

Missing 

Distinct12
Distinct (%)< 0.1%
Missing571556
Missing (%)78.9%
Memory size5.5 MiB
2025-01-14T11:33:59.880640image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length2
Median length1
Mean length1.158729536
Min length1

Characters and Unicode

Total characters177230
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row9
3rd row8
4th row10
5th row4
ValueCountFrequency (%)
8 25708
16.8%
7 25619
16.7%
6 15211
9.9%
5 14666
9.6%
10 14523
9.5%
9 14275
9.3%
4 11358
7.4%
2 8535
 
5.6%
3 8472
 
5.5%
11 6678
 
4.4%
Other values (2) 7907
 
5.2%
2025-01-14T11:33:59.984111image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 35786
20.2%
8 25708
14.5%
7 25619
14.5%
6 15211
8.6%
5 14666
8.3%
0 14523
8.2%
9 14275
 
8.1%
2 11612
 
6.6%
4 11358
 
6.4%
3 8472
 
4.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 177230
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 35786
20.2%
8 25708
14.5%
7 25619
14.5%
6 15211
8.6%
5 14666
8.3%
0 14523
8.2%
9 14275
 
8.1%
2 11612
 
6.6%
4 11358
 
6.4%
3 8472
 
4.8%

Most occurring scripts

ValueCountFrequency (%)
Common 177230
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 35786
20.2%
8 25708
14.5%
7 25619
14.5%
6 15211
8.6%
5 14666
8.3%
0 14523
8.2%
9 14275
 
8.1%
2 11612
 
6.6%
4 11358
 
6.4%
3 8472
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 177230
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 35786
20.2%
8 25708
14.5%
7 25619
14.5%
6 15211
8.6%
5 14666
8.3%
0 14523
8.2%
9 14275
 
8.1%
2 11612
 
6.6%
4 11358
 
6.4%
3 8472
 
4.8%

day
Text

Missing 

Distinct31
Distinct (%)< 0.1%
Missing593848
Missing (%)82.0%
Memory size5.5 MiB
2025-01-14T11:34:00.055495image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length2
Median length2
Mean length1.719868361
Min length1

Characters and Unicode

Total characters224718
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row23
2nd row24
3rd row18
4th row14
5th row9
ValueCountFrequency (%)
17 5517
 
4.2%
16 5029
 
3.8%
18 5015
 
3.8%
13 4668
 
3.6%
23 4653
 
3.6%
14 4622
 
3.5%
20 4591
 
3.5%
8 4550
 
3.5%
15 4473
 
3.4%
11 4420
 
3.4%
Other values (21) 83122
63.6%
2025-01-14T11:34:00.180078image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 61429
27.3%
2 53857
24.0%
3 19502
 
8.7%
7 13732
 
6.1%
8 13721
 
6.1%
6 13069
 
5.8%
0 12986
 
5.8%
4 12423
 
5.5%
9 12062
 
5.4%
5 11937
 
5.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 224718
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 61429
27.3%
2 53857
24.0%
3 19502
 
8.7%
7 13732
 
6.1%
8 13721
 
6.1%
6 13069
 
5.8%
0 12986
 
5.8%
4 12423
 
5.5%
9 12062
 
5.4%
5 11937
 
5.3%

Most occurring scripts

ValueCountFrequency (%)
Common 224718
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 61429
27.3%
2 53857
24.0%
3 19502
 
8.7%
7 13732
 
6.1%
8 13721
 
6.1%
6 13069
 
5.8%
0 12986
 
5.8%
4 12423
 
5.5%
9 12062
 
5.4%
5 11937
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 224718
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 61429
27.3%
2 53857
24.0%
3 19502
 
8.7%
7 13732
 
6.1%
8 13721
 
6.1%
6 13069
 
5.8%
0 12986
 
5.8%
4 12423
 
5.5%
9 12062
 
5.4%
5 11937
 
5.3%

verbatimEventDate
Text

Missing 

Distinct17805
Distinct (%)6.4%
Missing445814
Missing (%)61.5%
Memory size5.5 MiB
2025-01-14T11:34:00.363528image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length61
Median length11
Mean length11.41229808
Min length4

Characters and Unicode

Total characters3180539
Distinct characters69
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5871 ?
Unique (%)2.1%

Sample

1st row23 JAN 1985
2nd rowApril, 1928
3rd row-- --- 1980
4th row-- --- 1963
5th row-- --- 1956
ValueCountFrequency (%)
235730
28.9%
aug 23677
 
2.9%
jul 22916
 
2.8%
summer 20031
 
2.5%
jun 14619
 
1.8%
may 14325
 
1.8%
oct 14287
 
1.7%
to 13955
 
1.7%
sep 13176
 
1.6%
apr 10764
 
1.3%
Other values (1210) 433163
53.0%
2025-01-14T11:34:00.617609image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 633590
19.9%
537949
16.9%
1 382844
12.0%
9 314473
9.9%
8 105770
 
3.3%
0 101858
 
3.2%
7 96225
 
3.0%
2 94879
 
3.0%
6 69663
 
2.2%
A 63864
 
2.0%
Other values (59) 779424
24.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1340357
42.1%
Dash Punctuation 633590
19.9%
Space Separator 537949
16.9%
Uppercase Letter 491521
 
15.5%
Lowercase Letter 169648
 
5.3%
Other Punctuation 6422
 
0.2%
Math Symbol 1026
 
< 0.1%
Open Punctuation 13
 
< 0.1%
Close Punctuation 13
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
m 40530
23.9%
u 32141
18.9%
e 26707
15.7%
r 24584
14.5%
t 7049
 
4.2%
a 5225
 
3.1%
l 4565
 
2.7%
g 3709
 
2.2%
n 3604
 
2.1%
p 3590
 
2.1%
Other values (13) 17944
10.6%
Uppercase Letter
ValueCountFrequency (%)
A 63864
13.0%
U 61193
12.4%
J 48266
 
9.8%
O 36480
 
7.4%
S 35414
 
7.2%
T 28143
 
5.7%
N 24509
 
5.0%
P 23974
 
4.9%
E 23721
 
4.8%
G 23661
 
4.8%
Other values (11) 122296
24.9%
Decimal Number
ValueCountFrequency (%)
1 382844
28.6%
9 314473
23.5%
8 105770
 
7.9%
0 101858
 
7.6%
7 96225
 
7.2%
2 94879
 
7.1%
6 69663
 
5.2%
3 60386
 
4.5%
4 58552
 
4.4%
5 55707
 
4.2%
Other Punctuation
ValueCountFrequency (%)
, 3733
58.1%
. 1309
 
20.4%
' 650
 
10.1%
/ 634
 
9.9%
? 92
 
1.4%
; 2
 
< 0.1%
& 1
 
< 0.1%
* 1
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
| 1017
99.1%
+ 5
 
0.5%
~ 4
 
0.4%
Dash Punctuation
ValueCountFrequency (%)
- 633590
100.0%
Space Separator
ValueCountFrequency (%)
537949
100.0%
Open Punctuation
ValueCountFrequency (%)
( 13
100.0%
Close Punctuation
ValueCountFrequency (%)
) 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2519370
79.2%
Latin 661169
 
20.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 63864
 
9.7%
U 61193
 
9.3%
J 48266
 
7.3%
m 40530
 
6.1%
O 36480
 
5.5%
S 35414
 
5.4%
u 32141
 
4.9%
T 28143
 
4.3%
e 26707
 
4.0%
r 24584
 
3.7%
Other values (34) 263847
39.9%
Common
ValueCountFrequency (%)
- 633590
25.1%
537949
21.4%
1 382844
15.2%
9 314473
12.5%
8 105770
 
4.2%
0 101858
 
4.0%
7 96225
 
3.8%
2 94879
 
3.8%
6 69663
 
2.8%
3 60386
 
2.4%
Other values (15) 121733
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3180539
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 633590
19.9%
537949
16.9%
1 382844
12.0%
9 314473
9.9%
8 105770
 
3.3%
0 101858
 
3.2%
7 96225
 
3.0%
2 94879
 
3.0%
6 69663
 
2.2%
A 63864
 
2.0%
Other values (59) 779424
24.5%

locationID
Text

Missing 

Distinct66560
Distinct (%)17.1%
Missing335037
Missing (%)46.2%
Memory size5.5 MiB
2025-01-14T11:34:00.816648image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length61
Median length59
Mean length5.757204002
Min length1

Characters and Unicode

Total characters2242264
Distinct characters81
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique40451 ?
Unique (%)10.4%

Sample

1st row1612
2nd row06
3rd rowUSGS LOC M533
4th row42246
5th row707A
ValueCountFrequency (%)
42246 30863
 
6.4%
35k 30551
 
6.3%
loc 19929
 
4.1%
sta 7656
 
1.6%
d 5640
 
1.2%
site 4020
 
0.8%
40193 3269
 
0.7%
leg 3132
 
0.7%
olson 2904
 
0.6%
41142 2897
 
0.6%
Other values (59519) 370823
77.0%
2025-01-14T11:34:01.091520image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 252324
 
11.3%
1 209625
 
9.3%
4 194523
 
8.7%
3 152357
 
6.8%
0 140257
 
6.3%
5 136706
 
6.1%
6 130433
 
5.8%
7 107242
 
4.8%
8 99787
 
4.5%
9 93127
 
4.2%
Other values (71) 725883
32.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1516381
67.6%
Uppercase Letter 531863
 
23.7%
Space Separator 92213
 
4.1%
Dash Punctuation 52032
 
2.3%
Other Punctuation 28932
 
1.3%
Lowercase Letter 15132
 
0.7%
Math Symbol 3062
 
0.1%
Close Punctuation 1336
 
0.1%
Open Punctuation 1313
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
O 51448
 
9.7%
L 50984
 
9.6%
C 46019
 
8.7%
S 44241
 
8.3%
A 41228
 
7.8%
E 37168
 
7.0%
K 36506
 
6.9%
T 30011
 
5.6%
I 25951
 
4.9%
N 20969
 
3.9%
Other values (16) 147338
27.7%
Lowercase Letter
ValueCountFrequency (%)
e 2360
15.6%
a 1816
12.0%
g 1802
11.9%
t 1447
9.6%
o 1201
7.9%
c 1136
7.5%
i 1026
6.8%
s 789
 
5.2%
b 707
 
4.7%
n 562
 
3.7%
Other values (16) 2286
15.1%
Other Punctuation
ValueCountFrequency (%)
. 13863
47.9%
, 10529
36.4%
* 2055
 
7.1%
/ 1776
 
6.1%
' 442
 
1.5%
# 178
 
0.6%
; 41
 
0.1%
? 34
 
0.1%
: 7
 
< 0.1%
" 6
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
2 252324
16.6%
1 209625
13.8%
4 194523
12.8%
3 152357
10.0%
0 140257
9.2%
5 136706
9.0%
6 130433
8.6%
7 107242
7.1%
8 99787
 
6.6%
9 93127
 
6.1%
Math Symbol
ValueCountFrequency (%)
+ 3039
99.2%
= 23
 
0.8%
Close Punctuation
ValueCountFrequency (%)
) 1335
99.9%
] 1
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 1304
99.3%
[ 9
 
0.7%
Space Separator
ValueCountFrequency (%)
92213
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 52032
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1695269
75.6%
Latin 546995
 
24.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
O 51448
 
9.4%
L 50984
 
9.3%
C 46019
 
8.4%
S 44241
 
8.1%
A 41228
 
7.5%
E 37168
 
6.8%
K 36506
 
6.7%
T 30011
 
5.5%
I 25951
 
4.7%
N 20969
 
3.8%
Other values (42) 162470
29.7%
Common
ValueCountFrequency (%)
2 252324
14.9%
1 209625
12.4%
4 194523
11.5%
3 152357
9.0%
0 140257
8.3%
5 136706
8.1%
6 130433
7.7%
7 107242
6.3%
8 99787
 
5.9%
9 93127
 
5.5%
Other values (19) 178888
10.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2242264
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 252324
 
11.3%
1 209625
 
9.3%
4 194523
 
8.7%
3 152357
 
6.8%
0 140257
 
6.3%
5 136706
 
6.1%
6 130433
 
5.8%
7 107242
 
4.8%
8 99787
 
4.5%
9 93127
 
4.2%
Other values (71) 725883
32.4%

higherGeography
Text

Missing 

Distinct4708
Distinct (%)0.8%
Missing148417
Missing (%)20.5%
Memory size5.5 MiB
2025-01-14T11:34:01.286795image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length111
Median length97
Mean length42.17362361
Min length4

Characters and Unicode

Total characters24295845
Distinct characters68
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1213 ?
Unique (%)0.2%

Sample

1st rowNorth America, United States, Florida
2nd rowAfrica, Kenya, Marsabit
3rd rowNorth America, United States, Nevada, Pershing County
4th rowCuba, Camaguey Prov
5th rowNorth America, United States, North Carolina, Beaufort County
ValueCountFrequency (%)
north 537307
16.4%
america 480121
14.7%
united 421781
12.9%
states 421705
12.9%
county 259124
 
7.9%
carolina 46843
 
1.4%
canada 38942
 
1.2%
texas 38273
 
1.2%
colorado 35917
 
1.1%
beaufort 33680
 
1.0%
Other values (2951) 959718
29.3%
2025-01-14T11:34:01.549276image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2697320
 
11.1%
t 2343978
 
9.6%
a 2051368
 
8.4%
e 1823223
 
7.5%
i 1571709
 
6.5%
r 1497295
 
6.2%
o 1387848
 
5.7%
, 1279367
 
5.3%
n 1260166
 
5.2%
s 766919
 
3.2%
Other values (58) 7616652
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 17040948
70.1%
Uppercase Letter 3272221
 
13.5%
Space Separator 2697320
 
11.1%
Other Punctuation 1284183
 
5.3%
Dash Punctuation 1169
 
< 0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 2343978
13.8%
a 2051368
12.0%
e 1823223
10.7%
i 1571709
9.2%
r 1497295
8.8%
o 1387848
8.1%
n 1260166
7.4%
s 766919
 
4.5%
h 662498
 
3.9%
c 650930
 
3.8%
Other values (24) 3025014
17.8%
Uppercase Letter
ValueCountFrequency (%)
N 590551
18.0%
A 571156
17.5%
C 498307
15.2%
S 484309
14.8%
U 430602
13.2%
B 108340
 
3.3%
M 87750
 
2.7%
O 60025
 
1.8%
T 59534
 
1.8%
P 52139
 
1.6%
Other values (16) 329508
10.1%
Other Punctuation
ValueCountFrequency (%)
, 1279367
99.6%
. 3038
 
0.2%
' 1757
 
0.1%
? 21
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2697320
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1169
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 20313169
83.6%
Common 3982676
 
16.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 2343978
 
11.5%
a 2051368
 
10.1%
e 1823223
 
9.0%
i 1571709
 
7.7%
r 1497295
 
7.4%
o 1387848
 
6.8%
n 1260166
 
6.2%
s 766919
 
3.8%
h 662498
 
3.3%
c 650930
 
3.2%
Other values (50) 6297235
31.0%
Common
ValueCountFrequency (%)
2697320
67.7%
, 1279367
32.1%
. 3038
 
0.1%
' 1757
 
< 0.1%
- 1169
 
< 0.1%
? 21
 
< 0.1%
( 2
 
< 0.1%
) 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24288672
> 99.9%
None 7173
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2697320
 
11.1%
t 2343978
 
9.7%
a 2051368
 
8.4%
e 1823223
 
7.5%
i 1571709
 
6.5%
r 1497295
 
6.2%
o 1387848
 
5.7%
, 1279367
 
5.3%
n 1260166
 
5.2%
s 766919
 
3.2%
Other values (50) 7609479
31.3%
None
ValueCountFrequency (%)
ó 3473
48.4%
í 2116
29.5%
á 1037
 
14.5%
é 539
 
7.5%
ñ 4
 
0.1%
è 2
 
< 0.1%
ä 1
 
< 0.1%
ú 1
 
< 0.1%

continent
Text

Missing 

Distinct44
Distinct (%)< 0.1%
Missing210428
Missing (%)29.0%
Memory size5.5 MiB
2025-01-14T11:34:01.611312image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length36
Median length13
Mean length13.19896709
Min length4

Characters and Unicode

Total characters6785325
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowNorth America
2nd rowAfrica
3rd rowNorth America
4th rowNorth America
5th rowNorth America
ValueCountFrequency (%)
north 491990
47.1%
america 480118
46.0%
ocean 26667
 
2.6%
atlantic 13621
 
1.3%
south 9893
 
0.9%
pacific 8356
 
0.8%
indian 4034
 
0.4%
africa 3468
 
0.3%
oceania 2870
 
0.3%
europe 1626
 
0.2%
Other values (7) 1509
 
0.1%
2025-01-14T11:34:01.728382image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
r 977899
14.4%
c 544584
8.0%
a 542896
8.0%
530072
7.8%
t 529855
7.8%
i 522205
7.7%
e 511408
7.5%
o 503636
7.4%
h 502009
7.4%
A 498588
7.3%
Other values (16) 1122173
16.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5209576
76.8%
Uppercase Letter 1044152
 
15.4%
Space Separator 530072
 
7.8%
Other Punctuation 1525
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 977899
18.8%
c 544584
10.5%
a 542896
10.4%
t 529855
10.2%
i 522205
10.0%
e 511408
9.8%
o 503636
9.7%
h 502009
9.6%
m 480119
9.2%
n 51386
 
1.0%
Other values (6) 43579
 
0.8%
Uppercase Letter
ValueCountFrequency (%)
A 498588
47.8%
N 491990
47.1%
O 29537
 
2.8%
S 10020
 
1.0%
P 8356
 
0.8%
I 4034
 
0.4%
E 1626
 
0.2%
T 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
530072
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1525
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6253728
92.2%
Common 531597
 
7.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 977899
15.6%
c 544584
8.7%
a 542896
8.7%
t 529855
8.5%
i 522205
8.4%
e 511408
8.2%
o 503636
8.1%
h 502009
8.0%
A 498588
8.0%
N 491990
7.9%
Other values (14) 628658
10.1%
Common
ValueCountFrequency (%)
530072
99.7%
, 1525
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6785325
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 977899
14.4%
c 544584
8.0%
a 542896
8.0%
530072
7.8%
t 529855
7.8%
i 522205
7.7%
e 511408
7.5%
o 503636
7.4%
h 502009
7.4%
A 498588
7.3%
Other values (16) 1122173
16.5%

waterBody
Text

Missing 

Distinct172
Distinct (%)0.6%
Missing696851
Missing (%)96.2%
Memory size5.5 MiB
2025-01-14T11:34:01.831154image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length61
Median length54
Mean length21.95758759
Min length8

Characters and Unicode

Total characters607281
Distinct characters49
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique58 ?
Unique (%)0.2%

Sample

1st rowNorth Atlantic Ocean
2nd rowNorth Pacific Ocean
3rd rowNorth Atlantic Ocean, Caribbean Sea
4th rowNorth Atlantic Ocean
5th rowNorth Atlantic Ocean
ValueCountFrequency (%)
ocean 26667
28.1%
north 18835
19.9%
atlantic 13621
14.4%
pacific 8356
 
8.8%
sea 5778
 
6.1%
indian 4034
 
4.3%
south 2993
 
3.2%
timor 2479
 
2.6%
of 2181
 
2.3%
gulf 2067
 
2.2%
Other values (146) 7758
 
8.2%
2025-01-14T11:34:02.004203image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
67112
11.1%
a 66029
10.9%
c 60399
9.9%
n 52729
 
8.7%
t 51240
 
8.4%
i 42959
 
7.1%
e 39252
 
6.5%
o 28732
 
4.7%
O 27050
 
4.5%
r 26329
 
4.3%
Other values (39) 145450
24.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 439588
72.4%
Uppercase Letter 92948
 
15.3%
Space Separator 67112
 
11.1%
Other Punctuation 7633
 
1.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 66029
15.0%
c 60399
13.7%
n 52729
12.0%
t 51240
11.7%
i 42959
9.8%
e 39252
8.9%
o 28732
6.5%
r 26329
 
6.0%
h 22202
 
5.1%
l 16619
 
3.8%
Other values (15) 33098
7.5%
Uppercase Letter
ValueCountFrequency (%)
O 27050
29.1%
N 18947
20.4%
A 14632
15.7%
S 9530
 
10.3%
P 8558
 
9.2%
I 4100
 
4.4%
M 2579
 
2.8%
T 2567
 
2.8%
G 2317
 
2.5%
C 1788
 
1.9%
Other values (12) 880
 
0.9%
Space Separator
ValueCountFrequency (%)
67112
100.0%
Other Punctuation
ValueCountFrequency (%)
, 7633
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 532536
87.7%
Common 74745
 
12.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 66029
12.4%
c 60399
11.3%
n 52729
9.9%
t 51240
9.6%
i 42959
 
8.1%
e 39252
 
7.4%
o 28732
 
5.4%
O 27050
 
5.1%
r 26329
 
4.9%
h 22202
 
4.2%
Other values (37) 115615
21.7%
Common
ValueCountFrequency (%)
67112
89.8%
, 7633
 
10.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 607281
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
67112
11.1%
a 66029
10.9%
c 60399
9.9%
n 52729
 
8.7%
t 51240
 
8.4%
i 42959
 
7.1%
e 39252
 
6.5%
o 28732
 
4.7%
O 27050
 
4.5%
r 26329
 
4.3%
Other values (39) 145450
24.0%

islandGroup
Text

Missing 

Distinct33
Distinct (%)4.1%
Missing723710
Missing (%)99.9%
Memory size5.5 MiB
2025-01-14T11:34:02.073905image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length25
Median length24
Mean length16.78571429
Min length5

Characters and Unicode

Total characters13395
Distinct characters46
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)1.6%

Sample

1st rowMariana Islands
2nd rowNorthern Mariana Islands
3rd rowGilbert Islands
4th rowGilbert Islands
5th rowAleutian Islands
ValueCountFrequency (%)
islands 765
44.5%
marshall 241
 
14.0%
mariana 155
 
9.0%
gilbert 135
 
7.9%
northern 134
 
7.8%
marianas 120
 
7.0%
solomon 21
 
1.2%
ryukyu 18
 
1.0%
hawaiian 18
 
1.0%
antilles 15
 
0.9%
Other values (26) 97
 
5.6%
2025-01-14T11:34:02.212677image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 2202
16.4%
s 1936
14.5%
l 1461
10.9%
n 1270
9.5%
r 960
7.2%
921
6.9%
d 800
 
6.0%
I 765
 
5.7%
M 527
 
3.9%
i 498
 
3.7%
Other values (36) 2055
15.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10752
80.3%
Uppercase Letter 1720
 
12.8%
Space Separator 921
 
6.9%
Other Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2202
20.5%
s 1936
18.0%
l 1461
13.6%
n 1270
11.8%
r 960
8.9%
d 800
 
7.4%
i 498
 
4.6%
h 376
 
3.5%
e 374
 
3.5%
t 298
 
2.8%
Other values (13) 577
 
5.4%
Uppercase Letter
ValueCountFrequency (%)
I 765
44.5%
M 527
30.6%
N 140
 
8.1%
G 135
 
7.8%
A 25
 
1.5%
L 24
 
1.4%
S 24
 
1.4%
H 18
 
1.0%
R 18
 
1.0%
C 11
 
0.6%
Other values (11) 33
 
1.9%
Space Separator
ValueCountFrequency (%)
921
100.0%
Other Punctuation
ValueCountFrequency (%)
. 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12472
93.1%
Common 923
 
6.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2202
17.7%
s 1936
15.5%
l 1461
11.7%
n 1270
10.2%
r 960
7.7%
d 800
 
6.4%
I 765
 
6.1%
M 527
 
4.2%
i 498
 
4.0%
h 376
 
3.0%
Other values (34) 1677
13.4%
Common
ValueCountFrequency (%)
921
99.8%
. 2
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13395
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 2202
16.4%
s 1936
14.5%
l 1461
10.9%
n 1270
9.5%
r 960
7.2%
921
6.9%
d 800
 
6.0%
I 765
 
5.7%
M 527
 
3.9%
i 498
 
3.7%
Other values (36) 2055
15.3%

island
Text

Missing 

Distinct87
Distinct (%)0.9%
Missing714401
Missing (%)98.6%
Memory size5.5 MiB
2025-01-14T11:34:02.290735image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length21
Median length4
Mean length6.015335906
Min length3

Characters and Unicode

Total characters60797
Distinct characters50
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)0.4%

Sample

1st rowOahu
2nd rowOahu
3rd rowOahu
4th rowAnimasola Island
5th rowMolokai
ValueCountFrequency (%)
oahu 5926
51.1%
molokai 2218
 
19.1%
saint 944
 
8.1%
helena 938
 
8.1%
atoll 241
 
2.1%
saipan 132
 
1.1%
guam 129
 
1.1%
onotoa 116
 
1.0%
martha's 108
 
0.9%
vineyard 108
 
0.9%
Other values (91) 728
 
6.3%
2025-01-14T11:34:02.438541image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 11360
18.7%
u 6232
10.3%
h 6099
10.0%
O 6043
9.9%
o 5165
8.5%
i 4062
 
6.7%
l 3813
 
6.3%
n 2689
 
4.4%
k 2476
 
4.1%
M 2342
 
3.9%
Other values (40) 10516
17.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 47612
78.3%
Uppercase Letter 11591
 
19.1%
Space Separator 1481
 
2.4%
Other Punctuation 109
 
0.2%
Dash Punctuation 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 11360
23.9%
u 6232
13.1%
h 6099
12.8%
o 5165
10.8%
i 4062
 
8.5%
l 3813
 
8.0%
n 2689
 
5.6%
k 2476
 
5.2%
e 2309
 
4.8%
t 1709
 
3.6%
Other values (16) 1698
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
O 6043
52.1%
M 2342
 
20.2%
S 1177
 
10.2%
H 941
 
8.1%
A 273
 
2.4%
G 140
 
1.2%
B 138
 
1.2%
E 125
 
1.1%
V 121
 
1.0%
I 89
 
0.8%
Other values (11) 202
 
1.7%
Space Separator
ValueCountFrequency (%)
1481
100.0%
Other Punctuation
ValueCountFrequency (%)
' 109
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 59203
97.4%
Common 1594
 
2.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 11360
19.2%
u 6232
10.5%
h 6099
10.3%
O 6043
10.2%
o 5165
8.7%
i 4062
 
6.9%
l 3813
 
6.4%
n 2689
 
4.5%
k 2476
 
4.2%
M 2342
 
4.0%
Other values (37) 8922
15.1%
Common
ValueCountFrequency (%)
1481
92.9%
' 109
 
6.8%
- 4
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 60794
> 99.9%
None 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 11360
18.7%
u 6232
10.3%
h 6099
10.0%
O 6043
9.9%
o 5165
8.5%
i 4062
 
6.7%
l 3813
 
6.3%
n 2689
 
4.4%
k 2476
 
4.1%
M 2342
 
3.9%
Other values (38) 10513
17.3%
None
ValueCountFrequency (%)
ñ 2
66.7%
é 1
33.3%

country
Text

Missing 

Distinct227
Distinct (%)< 0.1%
Missing173269
Missing (%)23.9%
Memory size5.5 MiB
2025-01-14T11:34:02.618479image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length44
Median length13
Mean length11.8822108
Min length4

Characters and Unicode

Total characters6549938
Distinct characters57
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique39 ?
Unique (%)< 0.1%

Sample

1st rowUnited States
2nd rowKenya
3rd rowUnited States
4th rowCuba
5th rowUnited States
ValueCountFrequency (%)
united 421781
42.0%
states 421705
42.0%
canada 38942
 
3.9%
panama 8607
 
0.9%
republic 6480
 
0.6%
dominican 6290
 
0.6%
islands 4307
 
0.4%
mexico 3812
 
0.4%
colombia 3579
 
0.4%
france 3529
 
0.4%
Other values (228) 84524
 
8.4%
2025-01-14T11:34:02.877443image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 1291649
19.7%
e 891107
13.6%
a 672519
10.3%
n 536738
8.2%
i 496752
 
7.6%
d 485872
 
7.4%
s 453446
 
6.9%
452317
 
6.9%
S 427898
 
6.5%
U 422899
 
6.5%
Other values (47) 418741
 
6.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5095180
77.8%
Uppercase Letter 1001497
 
15.3%
Space Separator 452317
 
6.9%
Other Punctuation 942
 
< 0.1%
Open Punctuation 1
 
< 0.1%
Close Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 1291649
25.4%
e 891107
17.5%
a 672519
13.2%
n 536738
10.5%
i 496752
 
9.7%
d 485872
 
9.5%
s 453446
 
8.9%
c 41278
 
0.8%
l 37380
 
0.7%
o 35955
 
0.7%
Other values (17) 152484
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
S 427898
42.7%
U 422899
42.2%
C 51338
 
5.1%
P 16560
 
1.7%
R 12128
 
1.2%
I 10645
 
1.1%
A 10000
 
1.0%
M 6468
 
0.6%
D 6444
 
0.6%
B 5765
 
0.6%
Other values (15) 31352
 
3.1%
Other Punctuation
ValueCountFrequency (%)
, 940
99.8%
. 2
 
0.2%
Space Separator
ValueCountFrequency (%)
452317
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6096677
93.1%
Common 453261
 
6.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 1291649
21.2%
e 891107
14.6%
a 672519
11.0%
n 536738
8.8%
i 496752
 
8.1%
d 485872
 
8.0%
s 453446
 
7.4%
S 427898
 
7.0%
U 422899
 
6.9%
C 51338
 
0.8%
Other values (42) 366459
 
6.0%
Common
ValueCountFrequency (%)
452317
99.8%
, 940
 
0.2%
. 2
 
< 0.1%
( 1
 
< 0.1%
) 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6549937
> 99.9%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 1291649
19.7%
e 891107
13.6%
a 672519
10.3%
n 536738
8.2%
i 496752
 
7.6%
d 485872
 
7.4%
s 453446
 
6.9%
452317
 
6.9%
S 427898
 
6.5%
U 422899
 
6.5%
Other values (46) 418740
 
6.4%
None
ValueCountFrequency (%)
é 1
100.0%

stateProvince
Text

Missing 

Distinct892
Distinct (%)0.2%
Missing226462
Missing (%)31.3%
Memory size5.5 MiB
2025-01-14T11:34:03.075507image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length25
Median length23
Mean length8.789222281
Min length3

Characters and Unicode

Total characters4377437
Distinct characters64
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique236 ?
Unique (%)< 0.1%

Sample

1st rowFlorida
2nd rowMarsabit
3rd rowNevada
4th rowCamaguey Prov
5th rowNorth Carolina
ValueCountFrequency (%)
carolina 46813
 
7.5%
north 45129
 
7.2%
texas 38253
 
6.1%
colorado 35917
 
5.8%
california 32474
 
5.2%
columbia 32203
 
5.2%
british 32085
 
5.1%
alaska 28545
 
4.6%
new 23155
 
3.7%
wyoming 22778
 
3.6%
Other values (878) 287106
46.0%
2025-01-14T11:34:03.443653image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 622536
14.2%
i 445132
 
10.2%
o 412678
 
9.4%
r 299951
 
6.9%
n 262321
 
6.0%
l 249350
 
5.7%
s 213346
 
4.9%
e 190372
 
4.3%
C 155417
 
3.6%
t 143584
 
3.3%
Other values (54) 1382750
31.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3624857
82.8%
Uppercase Letter 625183
 
14.3%
Space Separator 126412
 
2.9%
Dash Punctuation 508
 
< 0.1%
Other Punctuation 475
 
< 0.1%
Open Punctuation 1
 
< 0.1%
Close Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 622536
17.2%
i 445132
12.3%
o 412678
11.4%
r 299951
8.3%
n 262321
 
7.2%
l 249350
 
6.9%
s 213346
 
5.9%
e 190372
 
5.3%
t 143584
 
4.0%
h 114639
 
3.2%
Other values (22) 670948
18.5%
Uppercase Letter
ValueCountFrequency (%)
C 155417
24.9%
N 87902
14.1%
M 48444
 
7.7%
T 47635
 
7.6%
A 45155
 
7.2%
B 36744
 
5.9%
W 32086
 
5.1%
H 20814
 
3.3%
O 19325
 
3.1%
I 17859
 
2.9%
Other values (16) 113802
18.2%
Other Punctuation
ValueCountFrequency (%)
. 425
89.5%
' 50
 
10.5%
Space Separator
ValueCountFrequency (%)
126412
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 508
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4250040
97.1%
Common 127397
 
2.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 622536
14.6%
i 445132
 
10.5%
o 412678
 
9.7%
r 299951
 
7.1%
n 262321
 
6.2%
l 249350
 
5.9%
s 213346
 
5.0%
e 190372
 
4.5%
C 155417
 
3.7%
t 143584
 
3.4%
Other values (48) 1255353
29.5%
Common
ValueCountFrequency (%)
126412
99.2%
- 508
 
0.4%
. 425
 
0.3%
' 50
 
< 0.1%
( 1
 
< 0.1%
) 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4371514
99.9%
None 5923
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 622536
14.2%
i 445132
 
10.2%
o 412678
 
9.4%
r 299951
 
6.9%
n 262321
 
6.0%
l 249350
 
5.7%
s 213346
 
4.9%
e 190372
 
4.4%
C 155417
 
3.6%
t 143584
 
3.3%
Other values (48) 1376827
31.5%
None
ValueCountFrequency (%)
ó 2622
44.3%
í 1945
32.8%
á 1034
 
17.5%
é 319
 
5.4%
è 2
 
< 0.1%
ñ 1
 
< 0.1%

county
Text

Missing 

Distinct1997
Distinct (%)0.7%
Missing454433
Missing (%)62.7%
Memory size5.5 MiB
2025-01-14T11:34:03.635761image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length34
Median length29
Mean length14.2528779
Min length3

Characters and Unicode

Total characters3849346
Distinct characters65
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique393 ?
Unique (%)0.1%

Sample

1st rowPershing County
2nd rowBeaufort County
3rd rowBrewster County
4th rowLos Angeles County
5th rowHonolulu County
ValueCountFrequency (%)
county 259124
45.6%
beaufort 33592
 
5.9%
brewster 15677
 
2.8%
maui 10401
 
1.8%
los 8883
 
1.6%
angeles 8865
 
1.6%
honolulu 5926
 
1.0%
san 4953
 
0.9%
lincoln 4346
 
0.8%
culberson 4132
 
0.7%
Other values (1945) 212334
37.4%
2025-01-14T11:34:03.902680image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 423340
11.0%
n 401510
10.4%
t 375302
9.7%
u 352655
9.2%
298158
 
7.7%
C 289740
 
7.5%
y 279783
 
7.3%
e 215178
 
5.6%
a 186491
 
4.8%
r 177010
 
4.6%
Other values (55) 850179
22.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2976107
77.3%
Uppercase Letter 570194
 
14.8%
Space Separator 298158
 
7.7%
Other Punctuation 4230
 
0.1%
Dash Punctuation 657
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 423340
14.2%
n 401510
13.5%
t 375302
12.6%
u 352655
11.8%
y 279783
9.4%
e 215178
7.2%
a 186491
6.3%
r 177010
5.9%
l 100058
 
3.4%
s 96459
 
3.2%
Other values (23) 368321
12.4%
Uppercase Letter
ValueCountFrequency (%)
C 289740
50.8%
B 65415
 
11.5%
M 27388
 
4.8%
S 25040
 
4.4%
L 22655
 
4.0%
P 16991
 
3.0%
A 16627
 
2.9%
H 14879
 
2.6%
D 12691
 
2.2%
W 9829
 
1.7%
Other values (16) 68939
 
12.1%
Other Punctuation
ValueCountFrequency (%)
. 2609
61.7%
' 1598
37.8%
? 21
 
0.5%
, 2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
298158
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 657
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3546301
92.1%
Common 303045
 
7.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 423340
11.9%
n 401510
11.3%
t 375302
10.6%
u 352655
9.9%
C 289740
 
8.2%
y 279783
 
7.9%
e 215178
 
6.1%
a 186491
 
5.3%
r 177010
 
5.0%
l 100058
 
2.8%
Other values (49) 745234
21.0%
Common
ValueCountFrequency (%)
298158
98.4%
. 2609
 
0.9%
' 1598
 
0.5%
- 657
 
0.2%
? 21
 
< 0.1%
, 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3848100
> 99.9%
None 1246
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 423340
11.0%
n 401510
10.4%
t 375302
9.8%
u 352655
9.2%
298158
 
7.7%
C 289740
 
7.5%
y 279783
 
7.3%
e 215178
 
5.6%
a 186491
 
4.8%
r 177010
 
4.6%
Other values (48) 848933
22.1%
None
ValueCountFrequency (%)
ó 851
68.3%
é 218
 
17.5%
í 171
 
13.7%
á 3
 
0.2%
ä 1
 
0.1%
ñ 1
 
0.1%
ú 1
 
0.1%

locality
Text

Missing 

Distinct31755
Distinct (%)19.4%
Missing560871
Missing (%)77.4%
Memory size5.5 MiB
2025-01-14T11:34:04.128567image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length471
Median length316
Mean length59.79365302
Min length1

Characters and Unicode

Total characters9784454
Distinct characters100
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21088 ?
Unique (%)12.9%

Sample

1st rowSt. Andrew Bay
2nd rowNuevitas Bay, Between Nuevitas And Pastelillo
3rd rowPalos Verdes Hills; East side of Deadman's Island
4th rowNorth slope of San Pedro Hills, ravine S of harbor City, 4200 feet N and 53.5 degrees E from 342-foot hill, 100 feet up ravine from end of Bellepoint Street (W98-30)
5th rowCoyote Springs Valley; spring
ValueCountFrequency (%)
of 120156
 
7.0%
34919
 
2.0%
and 22265
 
1.3%
bay 19665
 
1.1%
the 18421
 
1.1%
on 17778
 
1.0%
from 16823
 
1.0%
n 16777
 
1.0%
feet 15757
 
0.9%
river 15334
 
0.9%
Other values (34131) 1421831
82.7%
2025-01-14T11:34:04.419802image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1556089
 
15.9%
e 696361
 
7.1%
a 667574
 
6.8%
o 563183
 
5.8%
n 459218
 
4.7%
t 454511
 
4.6%
r 411334
 
4.2%
i 400897
 
4.1%
l 325764
 
3.3%
s 321111
 
3.3%
Other values (90) 3928412
40.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5944612
60.8%
Space Separator 1556089
 
15.9%
Uppercase Letter 1178423
 
12.0%
Decimal Number 550644
 
5.6%
Other Punctuation 394583
 
4.0%
Dash Punctuation 53241
 
0.5%
Open Punctuation 40436
 
0.4%
Close Punctuation 40130
 
0.4%
Math Symbol 26252
 
0.3%
Connector Punctuation 35
 
< 0.1%
Other values (2) 9
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 696361
11.7%
a 667574
11.2%
o 563183
 
9.5%
n 459218
 
7.7%
t 454511
 
7.6%
r 411334
 
6.9%
i 400897
 
6.7%
l 325764
 
5.5%
s 321111
 
5.4%
f 214145
 
3.6%
Other values (21) 1430514
24.1%
Uppercase Letter
ValueCountFrequency (%)
S 174349
14.8%
C 112608
 
9.6%
O 84502
 
7.2%
N 76103
 
6.5%
B 74870
 
6.4%
R 70202
 
6.0%
P 66766
 
5.7%
A 62224
 
5.3%
W 51082
 
4.3%
T 49542
 
4.2%
Other values (17) 356175
30.2%
Other Punctuation
ValueCountFrequency (%)
, 179506
45.5%
. 103955
26.3%
; 73054
18.5%
/ 19087
 
4.8%
' 7147
 
1.8%
: 4428
 
1.1%
# 4037
 
1.0%
" 1994
 
0.5%
? 703
 
0.2%
& 599
 
0.2%
Other values (5) 73
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 125210
22.7%
0 82093
14.9%
2 69469
12.6%
5 50957
9.3%
3 50931
9.2%
4 49415
 
9.0%
6 36615
 
6.6%
7 31244
 
5.7%
8 27594
 
5.0%
9 27116
 
4.9%
Math Symbol
ValueCountFrequency (%)
| 22235
84.7%
+ 2928
 
11.2%
= 1045
 
4.0%
± 36
 
0.1%
~ 8
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 37729
93.3%
{ 2081
 
5.1%
[ 626
 
1.5%
Close Punctuation
ValueCountFrequency (%)
) 37422
93.3%
} 2082
 
5.2%
] 626
 
1.6%
Currency Symbol
ValueCountFrequency (%)
$ 3
60.0%
2
40.0%
Space Separator
ValueCountFrequency (%)
1556089
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 53241
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 35
100.0%
Other Symbol
ValueCountFrequency (%)
° 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7123035
72.8%
Common 2661419
 
27.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 696361
 
9.8%
a 667574
 
9.4%
o 563183
 
7.9%
n 459218
 
6.4%
t 454511
 
6.4%
r 411334
 
5.8%
i 400897
 
5.6%
l 325764
 
4.6%
s 321111
 
4.5%
f 214145
 
3.0%
Other values (48) 2608937
36.6%
Common
ValueCountFrequency (%)
1556089
58.5%
, 179506
 
6.7%
1 125210
 
4.7%
. 103955
 
3.9%
0 82093
 
3.1%
; 73054
 
2.7%
2 69469
 
2.6%
- 53241
 
2.0%
5 50957
 
1.9%
3 50931
 
1.9%
Other values (32) 316914
 
11.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9784239
> 99.9%
None 213
 
< 0.1%
Currency Symbols 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1556089
 
15.9%
e 696361
 
7.1%
a 667574
 
6.8%
o 563183
 
5.8%
n 459218
 
4.7%
t 454511
 
4.6%
r 411334
 
4.2%
i 400897
 
4.1%
l 325764
 
3.3%
s 321111
 
3.3%
Other values (81) 3928197
40.1%
None
ValueCountFrequency (%)
ñ 93
43.7%
± 36
 
16.9%
à 36
 
16.9%
í 27
 
12.7%
á 14
 
6.6%
° 4
 
1.9%
é 2
 
0.9%
ö 1
 
0.5%
Currency Symbols
ValueCountFrequency (%)
2
100.0%

verbatimElevation
Text

Missing 

Distinct7
Distinct (%)3.6%
Missing724311
Missing (%)> 99.9%
Memory size5.5 MiB
2025-01-14T11:34:04.520044image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length88
Median length88
Mean length81.14720812
Min length8

Characters and Unicode

Total characters15986
Distinct characters55
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)1.0%

Sample

1st rowElevation for Rampart Cave derived from Google Earth by Dr. Jim Mead on 4 Decemeber 2023
2nd rowApprox.450-500ft Above Base Of Fm
3rd rowElevation for Rampart Cave derived from Google Earth by Dr. Jim Mead on 4 Decemeber 2023
4th rowElevation for Rampart Cave derived from Google Earth by Dr. Jim Mead on 4 Decemeber 2023
5th rowElevation for Rampart Cave derived from Google Earth by Dr. Jim Mead on 4 Decemeber 2023
ValueCountFrequency (%)
elevation 161
 
5.5%
by 161
 
5.5%
2023 161
 
5.5%
decemeber 161
 
5.5%
4 161
 
5.5%
mead 161
 
5.5%
jim 161
 
5.5%
dr 161
 
5.5%
on 161
 
5.5%
earth 161
 
5.5%
Other values (38) 1300
44.7%
2025-01-14T11:34:04.676095image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2713
17.0%
e 1696
 
10.6%
r 1185
 
7.4%
o 1092
 
6.8%
a 1023
 
6.4%
m 656
 
4.1%
t 562
 
3.5%
v 533
 
3.3%
i 527
 
3.3%
d 497
 
3.1%
Other values (45) 5502
34.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10285
64.3%
Space Separator 2713
 
17.0%
Uppercase Letter 1740
 
10.9%
Decimal Number 968
 
6.1%
Other Punctuation 239
 
1.5%
Math Symbol 29
 
0.2%
Dash Punctuation 12
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1696
16.5%
r 1185
11.5%
o 1092
10.6%
a 1023
9.9%
m 656
 
6.4%
t 562
 
5.5%
v 533
 
5.2%
i 527
 
5.1%
d 497
 
4.8%
n 407
 
4.0%
Other values (13) 2107
20.5%
Uppercase Letter
ValueCountFrequency (%)
D 322
18.5%
E 322
18.5%
C 194
11.1%
M 185
10.6%
J 161
9.3%
G 161
9.3%
R 161
9.3%
A 64
 
3.7%
B 53
 
3.0%
O 25
 
1.4%
Other values (8) 92
 
5.3%
Decimal Number
ValueCountFrequency (%)
2 354
36.6%
0 209
21.6%
4 173
17.9%
3 161
16.6%
5 40
 
4.1%
1 25
 
2.6%
6 5
 
0.5%
8 1
 
0.1%
Other Punctuation
ValueCountFrequency (%)
. 196
82.0%
, 42
 
17.6%
/ 1
 
0.4%
Space Separator
ValueCountFrequency (%)
2713
100.0%
Math Symbol
ValueCountFrequency (%)
+ 29
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12025
75.2%
Common 3961
 
24.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1696
14.1%
r 1185
 
9.9%
o 1092
 
9.1%
a 1023
 
8.5%
m 656
 
5.5%
t 562
 
4.7%
v 533
 
4.4%
i 527
 
4.4%
d 497
 
4.1%
n 407
 
3.4%
Other values (31) 3847
32.0%
Common
ValueCountFrequency (%)
2713
68.5%
2 354
 
8.9%
0 209
 
5.3%
. 196
 
4.9%
4 173
 
4.4%
3 161
 
4.1%
, 42
 
1.1%
5 40
 
1.0%
+ 29
 
0.7%
1 25
 
0.6%
Other values (4) 19
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15986
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2713
17.0%
e 1696
 
10.6%
r 1185
 
7.4%
o 1092
 
6.8%
a 1023
 
6.4%
m 656
 
4.1%
t 562
 
3.5%
v 533
 
3.3%
i 527
 
3.3%
d 497
 
3.1%
Other values (45) 5502
34.4%

verbatimDepth
Text

Missing 

Distinct17
Distinct (%)20.2%
Missing724424
Missing (%)> 99.9%
Memory size5.5 MiB
2025-01-14T11:34:04.739283image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length14
Median length10
Mean length5.523809524
Min length4

Characters and Unicode

Total characters464
Distinct characters40
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)10.7%

Sample

1st rowreef
2nd rowBeach
3rd row?48 Ms
4th rowBeach
5th rowIntertidal
ValueCountFrequency (%)
reef 30
27.5%
beach 25
22.9%
low 9
 
8.3%
ms 8
 
7.3%
water 7
 
6.4%
48 6
 
5.5%
no.4 4
 
3.7%
mnb 3
 
2.8%
57ms 2
 
1.8%
25 2
 
1.8%
Other values (12) 13
11.9%
2025-01-14T11:34:04.857786image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 96
20.7%
r 40
 
8.6%
a 37
 
8.0%
f 31
 
6.7%
c 26
 
5.6%
h 25
 
5.4%
25
 
5.4%
b 18
 
3.9%
o 13
 
2.8%
t 13
 
2.8%
Other values (30) 140
30.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 339
73.1%
Uppercase Letter 51
 
11.0%
Decimal Number 32
 
6.9%
Space Separator 25
 
5.4%
Other Punctuation 16
 
3.4%
Dash Punctuation 1
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 96
28.3%
r 40
11.8%
a 37
 
10.9%
f 31
 
9.1%
c 26
 
7.7%
h 25
 
7.4%
b 18
 
5.3%
o 13
 
3.8%
t 13
 
3.8%
s 10
 
2.9%
Other values (7) 30
 
8.8%
Uppercase Letter
ValueCountFrequency (%)
M 12
23.5%
B 10
19.6%
L 9
17.6%
W 8
15.7%
N 4
 
7.8%
F 2
 
3.9%
A 1
 
2.0%
S 1
 
2.0%
U 1
 
2.0%
C 1
 
2.0%
Other values (2) 2
 
3.9%
Decimal Number
ValueCountFrequency (%)
4 11
34.4%
8 8
25.0%
5 4
 
12.5%
7 3
 
9.4%
0 3
 
9.4%
2 2
 
6.2%
3 1
 
3.1%
Other Punctuation
ValueCountFrequency (%)
. 10
62.5%
? 6
37.5%
Space Separator
ValueCountFrequency (%)
25
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 390
84.1%
Common 74
 
15.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 96
24.6%
r 40
10.3%
a 37
 
9.5%
f 31
 
7.9%
c 26
 
6.7%
h 25
 
6.4%
b 18
 
4.6%
o 13
 
3.3%
t 13
 
3.3%
M 12
 
3.1%
Other values (19) 79
20.3%
Common
ValueCountFrequency (%)
25
33.8%
4 11
14.9%
. 10
 
13.5%
8 8
 
10.8%
? 6
 
8.1%
5 4
 
5.4%
7 3
 
4.1%
0 3
 
4.1%
2 2
 
2.7%
- 1
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 464
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 96
20.7%
r 40
 
8.6%
a 37
 
8.0%
f 31
 
6.7%
c 26
 
5.6%
h 25
 
5.4%
25
 
5.4%
b 18
 
3.9%
o 13
 
2.8%
t 13
 
2.8%
Other values (30) 140
30.2%

decimalLatitude
Text

Missing 

Distinct34307
Distinct (%)33.0%
Missing620569
Missing (%)85.7%
Memory size5.5 MiB
2025-01-14T11:34:05.060254image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length8
Median length7
Mean length6.719883778
Min length3

Characters and Unicode

Total characters698458
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique19066 ?
Unique (%)18.3%

Sample

1st row30.1564
2nd row36.9858
3rd row31.9911
4th row69.08
5th row17.8883
ValueCountFrequency (%)
44.6458 1686
 
1.6%
17.5 673
 
0.6%
29.8119 329
 
0.3%
33.1767 323
 
0.3%
34.6405 307
 
0.3%
38.8295 287
 
0.3%
41.1458 279
 
0.3%
48.1104 243
 
0.2%
40.6184 235
 
0.2%
31.6767 227
 
0.2%
Other values (34049) 99350
95.6%
2025-01-14T11:34:05.331935image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 103939
14.9%
3 93842
13.4%
4 66308
9.5%
5 65933
9.4%
8 57884
8.3%
1 55433
7.9%
7 55155
7.9%
6 54645
7.8%
2 54452
7.8%
9 45816
6.6%
Other values (2) 45051
6.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 588796
84.3%
Other Punctuation 103939
 
14.9%
Dash Punctuation 5723
 
0.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 93842
15.9%
4 66308
11.3%
5 65933
11.2%
8 57884
9.8%
1 55433
9.4%
7 55155
9.4%
6 54645
9.3%
2 54452
9.2%
9 45816
7.8%
0 39328
6.7%
Other Punctuation
ValueCountFrequency (%)
. 103939
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5723
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 698458
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 103939
14.9%
3 93842
13.4%
4 66308
9.5%
5 65933
9.4%
8 57884
8.3%
1 55433
7.9%
7 55155
7.9%
6 54645
7.8%
2 54452
7.8%
9 45816
6.6%
Other values (2) 45051
6.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 698458
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 103939
14.9%
3 93842
13.4%
4 66308
9.5%
5 65933
9.4%
8 57884
8.3%
1 55433
7.9%
7 55155
7.9%
6 54645
7.8%
2 54452
7.8%
9 45816
6.6%
Other values (2) 45051
6.5%

decimalLongitude
Text

Missing 

Distinct35344
Distinct (%)34.0%
Missing620569
Missing (%)85.7%
Memory size5.5 MiB
2025-01-14T11:34:05.551414image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length9
Median length8
Mean length7.641020214
Min length3

Characters and Unicode

Total characters794200
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique19861 ?
Unique (%)19.1%

Sample

1st row-85.6439
2nd row-114.996
3rd row-80.7842
4th row-155.83
5th row-66.52
ValueCountFrequency (%)
123.908 1686
 
1.6%
95.0833 673
 
0.6%
103.252 329
 
0.3%
98.6878 321
 
0.3%
105.851 307
 
0.3%
76.8473 287
 
0.3%
115.358 279
 
0.3%
123.934 243
 
0.2%
108.207 235
 
0.2%
123.18 230
 
0.2%
Other values (35142) 99349
95.6%
2025-01-14T11:34:05.826815image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 103939
13.1%
- 95620
12.0%
1 88364
11.1%
7 72540
9.1%
8 71709
9.0%
3 62429
7.9%
6 55880
7.0%
5 55457
7.0%
2 52919
6.7%
9 50099
6.3%
Other values (2) 85244
10.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 594641
74.9%
Other Punctuation 103939
 
13.1%
Dash Punctuation 95620
 
12.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 88364
14.9%
7 72540
12.2%
8 71709
12.1%
3 62429
10.5%
6 55880
9.4%
5 55457
9.3%
2 52919
8.9%
9 50099
8.4%
4 45122
7.6%
0 40122
6.7%
Other Punctuation
ValueCountFrequency (%)
. 103939
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 95620
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 794200
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 103939
13.1%
- 95620
12.0%
1 88364
11.1%
7 72540
9.1%
8 71709
9.0%
3 62429
7.9%
6 55880
7.0%
5 55457
7.0%
2 52919
6.7%
9 50099
6.3%
Other values (2) 85244
10.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 794200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 103939
13.1%
- 95620
12.0%
1 88364
11.1%
7 72540
9.1%
8 71709
9.0%
3 62429
7.9%
6 55880
7.0%
5 55457
7.0%
2 52919
6.7%
9 50099
6.3%
Other values (2) 85244
10.7%

geodeticDatum
Text

Missing 

Distinct5
Distinct (%)< 0.1%
Missing698201
Missing (%)96.4%
Memory size5.5 MiB
2025-01-14T11:34:05.893395image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length18
Median length18
Mean length17.69483407
Min length5

Characters and Unicode

Total characters465498
Distinct characters26
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWGS 84 (EPSG:4326)
2nd rowWGS 84 (EPSG:4326)
3rd rowWGS 84 (EPSG:4326)
4th rowWGS 84 (EPSG:4326)
5th rowWGS 84 (EPSG:4326)
ValueCountFrequency (%)
wgs 24628
32.1%
84 24628
32.1%
epsg:4326 24628
32.1%
nad27 561
 
0.7%
epsg:4267 561
 
0.7%
nad83 474
 
0.6%
epsg:4269 474
 
0.6%
wgs84 447
 
0.6%
not 197
 
0.3%
recorded 197
 
0.3%
2025-01-14T11:34:06.005428image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
G 50738
10.9%
S 50738
10.9%
4 50738
10.9%
50488
10.8%
2 26224
 
5.6%
) 25663
 
5.5%
( 25663
 
5.5%
E 25663
 
5.5%
P 25663
 
5.5%
: 25663
 
5.5%
Other values (16) 108257
23.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 180982
38.9%
Decimal Number 154872
33.3%
Space Separator 50488
 
10.8%
Close Punctuation 25663
 
5.5%
Open Punctuation 25663
 
5.5%
Other Punctuation 25663
 
5.5%
Lowercase Letter 2167
 
0.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
G 50738
28.0%
S 50738
28.0%
E 25663
14.2%
P 25663
14.2%
W 25075
13.9%
N 1035
 
0.6%
A 1035
 
0.6%
D 1035
 
0.6%
Decimal Number
ValueCountFrequency (%)
4 50738
32.8%
2 26224
16.9%
6 25663
16.6%
8 25549
16.5%
3 25102
16.2%
7 1122
 
0.7%
9 474
 
0.3%
Lowercase Letter
ValueCountFrequency (%)
o 394
18.2%
r 394
18.2%
e 394
18.2%
d 394
18.2%
n 197
9.1%
t 197
9.1%
c 197
9.1%
Space Separator
ValueCountFrequency (%)
50488
100.0%
Close Punctuation
ValueCountFrequency (%)
) 25663
100.0%
Open Punctuation
ValueCountFrequency (%)
( 25663
100.0%
Other Punctuation
ValueCountFrequency (%)
: 25663
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 282349
60.7%
Latin 183149
39.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
G 50738
27.7%
S 50738
27.7%
E 25663
14.0%
P 25663
14.0%
W 25075
13.7%
N 1035
 
0.6%
A 1035
 
0.6%
D 1035
 
0.6%
o 394
 
0.2%
r 394
 
0.2%
Other values (5) 1379
 
0.8%
Common
ValueCountFrequency (%)
4 50738
18.0%
50488
17.9%
2 26224
9.3%
) 25663
9.1%
( 25663
9.1%
: 25663
9.1%
6 25663
9.1%
8 25549
9.0%
3 25102
8.9%
7 1122
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 465498
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
G 50738
10.9%
S 50738
10.9%
4 50738
10.9%
50488
10.8%
2 26224
 
5.6%
) 25663
 
5.5%
( 25663
 
5.5%
E 25663
 
5.5%
P 25663
 
5.5%
: 25663
 
5.5%
Other values (16) 108257
23.3%

verbatimLatitude
Text

Missing 

Distinct2
Distinct (%)40.0%
Missing724503
Missing (%)> 99.9%
Memory size5.5 MiB
2025-01-14T11:34:06.052339image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length10
Median length9
Mean length9.4
Min length9

Characters and Unicode

Total characters47
Distinct characters9
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row11 53.4 N
2nd row11 53.4 N
3rd row11 53.4 N
4th row18 44.98 N
5th row18 44.98 N
ValueCountFrequency (%)
n 5
33.3%
11 3
20.0%
53.4 3
20.0%
18 2
 
13.3%
44.98 2
 
13.3%
2025-01-14T11:34:06.153857image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10
21.3%
1 8
17.0%
4 7
14.9%
. 5
10.6%
N 5
10.6%
8 4
 
8.5%
5 3
 
6.4%
3 3
 
6.4%
9 2
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 27
57.4%
Space Separator 10
 
21.3%
Other Punctuation 5
 
10.6%
Uppercase Letter 5
 
10.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 8
29.6%
4 7
25.9%
8 4
14.8%
5 3
 
11.1%
3 3
 
11.1%
9 2
 
7.4%
Space Separator
ValueCountFrequency (%)
10
100.0%
Other Punctuation
ValueCountFrequency (%)
. 5
100.0%
Uppercase Letter
ValueCountFrequency (%)
N 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 42
89.4%
Latin 5
 
10.6%

Most frequent character per script

Common
ValueCountFrequency (%)
10
23.8%
1 8
19.0%
4 7
16.7%
. 5
11.9%
8 4
 
9.5%
5 3
 
7.1%
3 3
 
7.1%
9 2
 
4.8%
Latin
ValueCountFrequency (%)
N 5
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 47
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10
21.3%
1 8
17.0%
4 7
14.9%
. 5
10.6%
N 5
10.6%
8 4
 
8.5%
5 3
 
6.4%
3 3
 
6.4%
9 2
 
4.3%

verbatimLongitude
Text

Missing 

Distinct2
Distinct (%)40.0%
Missing724503
Missing (%)> 99.9%
Memory size5.5 MiB
2025-01-14T11:34:06.198598image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length10
Median length9
Mean length9.4
Min length9

Characters and Unicode

Total characters47
Distinct characters9
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row48 14.7 E
2nd row48 14.7 E
3rd row48 14.7 E
4th row60 07.78 E
5th row60 07.78 E
ValueCountFrequency (%)
e 5
33.3%
48 3
20.0%
14.7 3
20.0%
60 2
 
13.3%
07.78 2
 
13.3%
2025-01-14T11:34:06.302791image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10
21.3%
7 7
14.9%
4 6
12.8%
8 5
10.6%
. 5
10.6%
E 5
10.6%
0 4
 
8.5%
1 3
 
6.4%
6 2
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 27
57.4%
Space Separator 10
 
21.3%
Other Punctuation 5
 
10.6%
Uppercase Letter 5
 
10.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
7 7
25.9%
4 6
22.2%
8 5
18.5%
0 4
14.8%
1 3
11.1%
6 2
 
7.4%
Space Separator
ValueCountFrequency (%)
10
100.0%
Other Punctuation
ValueCountFrequency (%)
. 5
100.0%
Uppercase Letter
ValueCountFrequency (%)
E 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 42
89.4%
Latin 5
 
10.6%

Most frequent character per script

Common
ValueCountFrequency (%)
10
23.8%
7 7
16.7%
4 6
14.3%
8 5
11.9%
. 5
11.9%
0 4
 
9.5%
1 3
 
7.1%
6 2
 
4.8%
Latin
ValueCountFrequency (%)
E 5
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 47
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10
21.3%
7 7
14.9%
4 6
12.8%
8 5
10.6%
. 5
10.6%
E 5
10.6%
0 4
 
8.5%
1 3
 
6.4%
6 2
 
4.3%

verbatimCoordinateSystem
Text

Constant  Missing 

Distinct1
Distinct (%)< 0.1%
Missing654265
Missing (%)90.3%
Memory size5.5 MiB
2025-01-14T11:34:06.348383image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length23
Median length23
Mean length23
Min length23

Characters and Unicode

Total characters1615589
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDegrees Minutes Seconds
2nd rowDegrees Minutes Seconds
3rd rowDegrees Minutes Seconds
4th rowDegrees Minutes Seconds
5th rowDegrees Minutes Seconds
ValueCountFrequency (%)
degrees 70243
33.3%
minutes 70243
33.3%
seconds 70243
33.3%
2025-01-14T11:34:06.447283image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 351215
21.7%
s 210729
13.0%
140486
 
8.7%
n 140486
 
8.7%
D 70243
 
4.3%
g 70243
 
4.3%
r 70243
 
4.3%
M 70243
 
4.3%
i 70243
 
4.3%
u 70243
 
4.3%
Other values (5) 351215
21.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1264374
78.3%
Uppercase Letter 210729
 
13.0%
Space Separator 140486
 
8.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 351215
27.8%
s 210729
16.7%
n 140486
 
11.1%
g 70243
 
5.6%
r 70243
 
5.6%
i 70243
 
5.6%
u 70243
 
5.6%
t 70243
 
5.6%
c 70243
 
5.6%
o 70243
 
5.6%
Uppercase Letter
ValueCountFrequency (%)
D 70243
33.3%
M 70243
33.3%
S 70243
33.3%
Space Separator
ValueCountFrequency (%)
140486
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1475103
91.3%
Common 140486
 
8.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 351215
23.8%
s 210729
14.3%
n 140486
 
9.5%
D 70243
 
4.8%
g 70243
 
4.8%
r 70243
 
4.8%
M 70243
 
4.8%
i 70243
 
4.8%
u 70243
 
4.8%
t 70243
 
4.8%
Other values (4) 280972
19.0%
Common
ValueCountFrequency (%)
140486
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1615589
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 351215
21.7%
s 210729
13.0%
140486
 
8.7%
n 140486
 
8.7%
D 70243
 
4.3%
g 70243
 
4.3%
r 70243
 
4.3%
M 70243
 
4.3%
i 70243
 
4.3%
u 70243
 
4.3%
Other values (5) 351215
21.7%

georeferenceProtocol
Text

Missing 

Distinct19
Distinct (%)0.1%
Missing695012
Missing (%)95.9%
Memory size5.5 MiB
2025-01-14T11:34:06.523010image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length81
Median length43
Mean length42.23633713
Min length7

Characters and Unicode

Total characters1245803
Distinct characters50
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowGeoreferencing Quick Reference Guide (2020)
2nd rowGeoreferencing Quick Reference Guide (2020)
3rd rowGeoreferencing Quick Reference Guide (2020)
4th rowGeoreferencing Quick Reference Guide (2020)
5th rowGeoreferencing Quick Reference Guide (2020)
ValueCountFrequency (%)
georeferencing 26344
17.6%
guide 26344
17.6%
reference 24178
16.2%
2020 24178
16.2%
quick 24178
16.2%
biogeomancer 2166
 
1.4%
2006 2166
 
1.4%
august 2166
 
1.4%
consortium 2166
 
1.4%
for 2166
 
1.4%
Other values (32) 13421
9.0%
2025-01-14T11:34:06.665861image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 237471
19.1%
119977
 
9.6%
r 87730
 
7.0%
i 84069
 
6.7%
n 82720
 
6.6%
c 81302
 
6.5%
u 58822
 
4.7%
G 54854
 
4.4%
0 52731
 
4.2%
f 52688
 
4.2%
Other values (40) 333439
26.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 844245
67.8%
Uppercase Letter 121633
 
9.8%
Space Separator 119977
 
9.6%
Decimal Number 105634
 
8.5%
Open Punctuation 24178
 
1.9%
Close Punctuation 24178
 
1.9%
Other Punctuation 5915
 
0.5%
Math Symbol 43
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 237471
28.1%
r 87730
 
10.4%
i 84069
 
10.0%
n 82720
 
9.8%
c 81302
 
9.6%
u 58822
 
7.0%
f 52688
 
6.2%
o 40962
 
4.9%
g 28625
 
3.4%
d 28111
 
3.3%
Other values (12) 61745
 
7.3%
Uppercase Letter
ValueCountFrequency (%)
G 54854
45.1%
Q 25508
21.0%
R 24645
20.3%
B 4332
 
3.6%
A 3450
 
2.8%
C 2537
 
2.1%
P 2195
 
1.8%
M 1338
 
1.1%
L 1299
 
1.1%
V 351
 
0.3%
Other values (6) 1124
 
0.9%
Decimal Number
ValueCountFrequency (%)
0 52731
49.9%
2 50522
47.8%
6 2166
 
2.1%
5 129
 
0.1%
4 43
 
< 0.1%
8 43
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 3205
54.2%
, 2710
45.8%
Space Separator
ValueCountFrequency (%)
119977
100.0%
Open Punctuation
ValueCountFrequency (%)
( 24178
100.0%
Close Punctuation
ValueCountFrequency (%)
) 24178
100.0%
Math Symbol
ValueCountFrequency (%)
+ 43
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 965878
77.5%
Common 279925
 
22.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 237471
24.6%
r 87730
 
9.1%
i 84069
 
8.7%
n 82720
 
8.6%
c 81302
 
8.4%
u 58822
 
6.1%
G 54854
 
5.7%
f 52688
 
5.5%
o 40962
 
4.2%
g 28625
 
3.0%
Other values (28) 156635
16.2%
Common
ValueCountFrequency (%)
119977
42.9%
0 52731
18.8%
2 50522
18.0%
( 24178
 
8.6%
) 24178
 
8.6%
. 3205
 
1.1%
, 2710
 
1.0%
6 2166
 
0.8%
5 129
 
< 0.1%
4 43
 
< 0.1%
Other values (2) 86
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1245803
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 237471
19.1%
119977
 
9.6%
r 87730
 
7.0%
i 84069
 
6.7%
n 82720
 
6.6%
c 81302
 
6.5%
u 58822
 
4.7%
G 54854
 
4.4%
0 52731
 
4.2%
f 52688
 
4.2%
Other values (40) 333439
26.8%

georeferenceRemarks
Text

Missing 

Distinct2
Distinct (%)40.0%
Missing724503
Missing (%)> 99.9%
Memory size5.5 MiB
2025-01-14T11:34:06.729576image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length70
Median length70
Mean length58
Min length10

Characters and Unicode

Total characters290
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)20.0%

Sample

1st rowA; B; C; D
2nd rowincluded in Jennifer Jett's Foram Bulk DB but not included in F Ledger
3rd rowincluded in Jennifer Jett's Foram Bulk DB but not included in F Ledger
4th rowincluded in Jennifer Jett's Foram Bulk DB but not included in F Ledger
5th rowincluded in Jennifer Jett's Foram Bulk DB but not included in F Ledger
ValueCountFrequency (%)
included 8
14.3%
in 8
14.3%
jennifer 4
7.1%
jett's 4
7.1%
foram 4
7.1%
bulk 4
7.1%
db 4
7.1%
but 4
7.1%
not 4
7.1%
f 4
7.1%
Other values (5) 8
14.3%
2025-01-14T11:34:06.846372image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
51
17.6%
n 28
 
9.7%
e 28
 
9.7%
i 20
 
6.9%
d 20
 
6.9%
u 16
 
5.5%
t 16
 
5.5%
r 12
 
4.1%
l 12
 
4.1%
B 9
 
3.1%
Other values (17) 78
26.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 196
67.6%
Space Separator 51
 
17.6%
Uppercase Letter 36
 
12.4%
Other Punctuation 7
 
2.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 28
14.3%
e 28
14.3%
i 20
10.2%
d 20
10.2%
u 16
8.2%
t 16
8.2%
r 12
6.1%
l 12
6.1%
c 8
 
4.1%
o 8
 
4.1%
Other values (7) 28
14.3%
Uppercase Letter
ValueCountFrequency (%)
B 9
25.0%
J 8
22.2%
F 8
22.2%
D 5
13.9%
L 4
11.1%
A 1
 
2.8%
C 1
 
2.8%
Other Punctuation
ValueCountFrequency (%)
' 4
57.1%
; 3
42.9%
Space Separator
ValueCountFrequency (%)
51
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 232
80.0%
Common 58
 
20.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 28
12.1%
e 28
12.1%
i 20
 
8.6%
d 20
 
8.6%
u 16
 
6.9%
t 16
 
6.9%
r 12
 
5.2%
l 12
 
5.2%
B 9
 
3.9%
J 8
 
3.4%
Other values (14) 63
27.2%
Common
ValueCountFrequency (%)
51
87.9%
' 4
 
6.9%
; 3
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 290
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
51
17.6%
n 28
 
9.7%
e 28
 
9.7%
i 20
 
6.9%
d 20
 
6.9%
u 16
 
5.5%
t 16
 
5.5%
r 12
 
4.1%
l 12
 
4.1%
B 9
 
3.1%
Other values (17) 78
26.9%
Distinct10
Distinct (%)< 0.1%
Missing220036
Missing (%)30.4%
Memory size5.5 MiB
2025-01-14T11:34:06.900038image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length16
Median length8
Mean length8.387123567
Min length8

Characters and Unicode

Total characters4231069
Distinct characters19
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowMesozoic
2nd rowCenozoic
3rd rowCenozoic
4th rowPaleozoic
5th rowCenozoic
ValueCountFrequency (%)
cenozoic 261752
51.9%
paleozoic 194023
38.5%
mesozoic 48343
 
9.6%
precambrian 298
 
0.1%
mesoproterozoic 41
 
< 0.1%
neoproterozoic 7
 
< 0.1%
paleoproterozoic 4
 
< 0.1%
paleoarchean 3
 
< 0.1%
mesoarchean 1
 
< 0.1%
2025-01-14T11:34:07.015992image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 1008448
23.8%
e 504528
11.9%
c 504472
11.9%
i 504468
11.9%
z 504170
11.9%
n 262054
 
6.2%
C 261752
 
6.2%
a 194634
 
4.6%
P 194327
 
4.6%
l 194030
 
4.6%
Other values (9) 98186
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3726598
88.1%
Uppercase Letter 504471
 
11.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 1008448
27.1%
e 504528
13.5%
c 504472
13.5%
i 504468
13.5%
z 504170
13.5%
n 262054
 
7.0%
a 194634
 
5.2%
l 194030
 
5.2%
s 48385
 
1.3%
r 704
 
< 0.1%
Other values (5) 705
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
C 261752
51.9%
P 194327
38.5%
M 48385
 
9.6%
N 7
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 4231069
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 1008448
23.8%
e 504528
11.9%
c 504472
11.9%
i 504468
11.9%
z 504170
11.9%
n 262054
 
6.2%
C 261752
 
6.2%
a 194634
 
4.6%
P 194327
 
4.6%
l 194030
 
4.6%
Other values (9) 98186
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4231069
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 1008448
23.8%
e 504528
11.9%
c 504472
11.9%
i 504468
11.9%
z 504170
11.9%
n 262054
 
6.2%
C 261752
 
6.2%
a 194634
 
4.6%
P 194327
 
4.6%
l 194030
 
4.6%
Other values (9) 98186
 
2.3%
Distinct5
Distinct (%)0.1%
Missing718163
Missing (%)99.1%
Memory size5.5 MiB
2025-01-14T11:34:07.066211image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length15
Median length8
Mean length8.134121355
Min length8

Characters and Unicode

Total characters51611
Distinct characters16
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowPaleozoic
2nd rowCenozoic
3rd rowMesozoic
4th rowCenozoic
5th rowCenozoic
ValueCountFrequency (%)
cenozoic 5229
82.4%
paleozoic 826
 
13.0%
mesozoic 286
 
4.5%
neoproterozoic 3
 
< 0.1%
mesoproterozoic 1
 
< 0.1%
2025-01-14T11:34:07.176549image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 12698
24.6%
e 6349
12.3%
z 6345
12.3%
i 6345
12.3%
c 6345
12.3%
C 5229
10.1%
n 5229
10.1%
P 826
 
1.6%
a 826
 
1.6%
l 826
 
1.6%
Other values (6) 593
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 45266
87.7%
Uppercase Letter 6345
 
12.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 12698
28.1%
e 6349
14.0%
z 6345
14.0%
i 6345
14.0%
c 6345
14.0%
n 5229
11.6%
a 826
 
1.8%
l 826
 
1.8%
s 287
 
0.6%
r 8
 
< 0.1%
Other values (2) 8
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
C 5229
82.4%
P 826
 
13.0%
M 287
 
4.5%
N 3
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 51611
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 12698
24.6%
e 6349
12.3%
z 6345
12.3%
i 6345
12.3%
c 6345
12.3%
C 5229
10.1%
n 5229
10.1%
P 826
 
1.6%
a 826
 
1.6%
l 826
 
1.6%
Other values (6) 593
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 51611
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 12698
24.6%
e 6349
12.3%
z 6345
12.3%
i 6345
12.3%
c 6345
12.3%
C 5229
10.1%
n 5229
10.1%
P 826
 
1.6%
a 826
 
1.6%
l 826
 
1.6%
Other values (6) 593
 
1.1%
Distinct27
Distinct (%)< 0.1%
Missing245750
Missing (%)33.9%
Memory size5.5 MiB
2025-01-14T11:34:07.238856image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length13
Median length10
Mean length8.607453035
Min length6

Characters and Unicode

Total characters4120887
Distinct characters35
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowTriassic
2nd rowPaleogene
3rd rowNeogene
4th rowPermian
5th rowQuaternary
ValueCountFrequency (%)
paleogene 90464
18.9%
neogene 72075
15.1%
cambrian 48808
10.2%
recent 41336
8.6%
ordovician 34462
 
7.2%
cretaceous 34238
 
7.2%
permian 32455
 
6.8%
quaternary 27798
 
5.8%
devonian 27637
 
5.8%
mississippian 19734
 
4.1%
Other values (14) 49751
10.4%
2025-01-14T11:34:07.367531image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 751141
18.2%
n 506768
12.3%
a 458678
11.1%
i 322536
 
7.8%
o 263741
 
6.4%
r 242986
 
5.9%
g 162539
 
3.9%
s 160613
 
3.9%
P 140533
 
3.4%
c 124669
 
3.0%
Other values (25) 986683
23.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3642156
88.4%
Uppercase Letter 478731
 
11.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 751141
20.6%
n 506768
13.9%
a 458678
12.6%
i 322536
8.9%
o 263741
 
7.2%
r 242986
 
6.7%
g 162539
 
4.5%
s 160613
 
4.4%
c 124669
 
3.4%
l 120100
 
3.3%
Other values (11) 528385
14.5%
Uppercase Letter
ValueCountFrequency (%)
P 140533
29.4%
C 84743
17.7%
N 72075
15.1%
R 41337
 
8.6%
O 34462
 
7.2%
Q 27798
 
5.8%
D 27637
 
5.8%
M 20068
 
4.2%
S 11625
 
2.4%
T 9097
 
1.9%
Other values (4) 9356
 
2.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4120887
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 751141
18.2%
n 506768
12.3%
a 458678
11.1%
i 322536
 
7.8%
o 263741
 
6.4%
r 242986
 
5.9%
g 162539
 
3.9%
s 160613
 
3.9%
P 140533
 
3.4%
c 124669
 
3.0%
Other values (25) 986683
23.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4120887
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 751141
18.2%
n 506768
12.3%
a 458678
11.1%
i 322536
 
7.8%
o 263741
 
6.4%
r 242986
 
5.9%
g 162539
 
3.9%
s 160613
 
3.9%
P 140533
 
3.4%
c 124669
 
3.0%
Other values (25) 986683
23.9%
Distinct15
Distinct (%)0.2%
Missing718167
Missing (%)99.1%
Memory size5.5 MiB
2025-01-14T11:34:07.427216image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length13
Median length10
Mean length8.077905693
Min length6

Characters and Unicode

Total characters51222
Distinct characters28
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowDevonian
2nd rowNeogene
3rd rowCretaceous
4th rowQuaternary
5th rowRecent
ValueCountFrequency (%)
neogene 3161
49.9%
paleogene 1404
22.1%
quaternary 668
 
10.5%
devonian 416
 
6.6%
cretaceous 185
 
2.9%
cambrian 161
 
2.5%
ordovician 137
 
2.2%
pennsylvanian 77
 
1.2%
recent 60
 
0.9%
silurian 30
 
0.5%
Other values (5) 42
 
0.7%
2025-01-14T11:34:07.549664image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 15352
30.0%
n 6768
13.2%
o 5307
 
10.4%
g 4565
 
8.9%
a 4026
 
7.9%
N 3161
 
6.2%
r 1892
 
3.7%
l 1511
 
2.9%
P 1484
 
2.9%
i 1053
 
2.1%
Other values (18) 6103
 
11.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 44881
87.6%
Uppercase Letter 6341
 
12.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 15352
34.2%
n 6768
15.1%
o 5307
 
11.8%
g 4565
 
10.2%
a 4026
 
9.0%
r 1892
 
4.2%
l 1511
 
3.4%
i 1053
 
2.3%
t 914
 
2.0%
u 898
 
2.0%
Other values (8) 2595
 
5.8%
Uppercase Letter
ValueCountFrequency (%)
N 3161
49.9%
P 1484
23.4%
Q 668
 
10.5%
D 416
 
6.6%
C 348
 
5.5%
O 137
 
2.2%
R 60
 
0.9%
S 31
 
0.5%
T 23
 
0.4%
J 13
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 51222
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 15352
30.0%
n 6768
13.2%
o 5307
 
10.4%
g 4565
 
8.9%
a 4026
 
7.9%
N 3161
 
6.2%
r 1892
 
3.7%
l 1511
 
2.9%
P 1484
 
2.9%
i 1053
 
2.1%
Other values (18) 6103
 
11.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 51222
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 15352
30.0%
n 6768
13.2%
o 5307
 
10.4%
g 4565
 
8.9%
a 4026
 
7.9%
N 3161
 
6.2%
r 1892
 
3.7%
l 1511
 
2.9%
P 1484
 
2.9%
i 1053
 
2.1%
Other values (18) 6103
 
11.9%
Distinct24
Distinct (%)< 0.1%
Missing376914
Missing (%)52.0%
Memory size5.5 MiB
2025-01-14T11:34:07.610886image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length13
Median length11
Mean length6.357434248
Min length1

Characters and Unicode

Total characters2209806
Distinct characters32
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowMiddle
2nd rowEocene
3rd rowPliocene
4th rowPleistocene
5th rowEarly
ValueCountFrequency (%)
middle 68576
19.7%
eocene 66980
19.3%
late 57993
16.7%
miocene 39410
11.3%
early 37474
10.8%
pliocene 32039
9.2%
pleistocene 20013
 
5.8%
oligocene 15521
 
4.5%
paleocene 7752
 
2.2%
holocene 1481
 
0.4%
Other values (10) 355
 
0.1%
2025-01-14T11:34:07.730768image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 520801
23.6%
o 184703
 
8.4%
n 183525
 
8.3%
c 183200
 
8.3%
l 183151
 
8.3%
i 175926
 
8.0%
d 137364
 
6.2%
M 107985
 
4.9%
E 104453
 
4.7%
a 104017
 
4.7%
Other values (22) 324681
14.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1862169
84.3%
Uppercase Letter 347612
 
15.7%
Other Punctuation 25
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 520801
28.0%
o 184703
 
9.9%
n 183525
 
9.9%
c 183200
 
9.8%
l 183151
 
9.8%
i 175926
 
9.4%
d 137364
 
7.4%
a 104017
 
5.6%
t 78031
 
4.2%
r 37590
 
2.0%
Other values (9) 73861
 
4.0%
Uppercase Letter
ValueCountFrequency (%)
M 107985
31.1%
E 104453
30.0%
P 59809
17.2%
L 58036
16.7%
O 15517
 
4.5%
H 1481
 
0.4%
G 195
 
0.1%
C 77
 
< 0.1%
D 27
 
< 0.1%
U 25
 
< 0.1%
Other values (2) 7
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 25
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2209781
> 99.9%
Common 25
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 520801
23.6%
o 184703
 
8.4%
n 183525
 
8.3%
c 183200
 
8.3%
l 183151
 
8.3%
i 175926
 
8.0%
d 137364
 
6.2%
M 107985
 
4.9%
E 104453
 
4.7%
a 104017
 
4.7%
Other values (21) 324656
14.7%
Common
ValueCountFrequency (%)
/ 25
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2209806
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 520801
23.6%
o 184703
 
8.4%
n 183525
 
8.3%
c 183200
 
8.3%
l 183151
 
8.3%
i 175926
 
8.0%
d 137364
 
6.2%
M 107985
 
4.9%
E 104453
 
4.7%
a 104017
 
4.7%
Other values (22) 324681
14.7%
Distinct12
Distinct (%)0.2%
Missing718290
Missing (%)99.1%
Memory size5.5 MiB
2025-01-14T11:34:07.788702image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length11
Median length9
Mean length7.33708588
Min length4

Characters and Unicode

Total characters45622
Distinct characters21
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowMiddle
2nd rowPliocene
3rd rowLate
4th rowPleistocene
5th rowMiocene
ValueCountFrequency (%)
pliocene 2384
38.3%
eocene 1075
17.3%
miocene 759
 
12.2%
late 645
 
10.4%
pleistocene 645
 
10.4%
middle 364
 
5.9%
oligocene 188
 
3.0%
paleocene 97
 
1.6%
early 34
 
0.5%
holocene 14
 
0.2%
Other values (2) 13
 
0.2%
2025-01-14T11:34:07.909585image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 12099
26.5%
o 5177
11.3%
n 5176
11.3%
c 5174
11.3%
i 4342
 
9.5%
l 3726
 
8.2%
P 3126
 
6.9%
t 1302
 
2.9%
M 1123
 
2.5%
E 1109
 
2.4%
Other values (11) 3268
 
7.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 39404
86.4%
Uppercase Letter 6218
 
13.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 12099
30.7%
o 5177
13.1%
n 5176
13.1%
c 5174
13.1%
i 4342
 
11.0%
l 3726
 
9.5%
t 1302
 
3.3%
a 777
 
2.0%
d 728
 
1.8%
s 645
 
1.6%
Other values (4) 258
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
P 3126
50.3%
M 1123
 
18.1%
E 1109
 
17.8%
L 646
 
10.4%
O 188
 
3.0%
H 14
 
0.2%
R 12
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 45622
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 12099
26.5%
o 5177
11.3%
n 5176
11.3%
c 5174
11.3%
i 4342
 
9.5%
l 3726
 
8.2%
P 3126
 
6.9%
t 1302
 
2.9%
M 1123
 
2.5%
E 1109
 
2.4%
Other values (11) 3268
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 45622
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 12099
26.5%
o 5177
11.3%
n 5176
11.3%
c 5174
11.3%
i 4342
 
9.5%
l 3726
 
8.2%
P 3126
 
6.9%
t 1302
 
2.9%
M 1123
 
2.5%
E 1109
 
2.4%
Other values (11) 3268
 
7.2%
Distinct366
Distinct (%)0.2%
Missing562472
Missing (%)77.6%
Memory size5.5 MiB
2025-01-14T11:34:08.085500image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length23
Median length19
Mean length9.036053716
Min length4

Characters and Unicode

Total characters1464166
Distinct characters54
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)< 0.1%

Sample

1st rowAnisian
2nd rowHemphillian
3rd rowMiddle
4th rowEmsian
5th rowIrvingtonian
ValueCountFrequency (%)
hemphillian 19681
 
12.1%
middle 17380
 
10.7%
wasatchian 7037
 
4.3%
early 5466
 
3.4%
orellan 5085
 
3.1%
bridgerian 4799
 
2.9%
maastrichtian 4686
 
2.9%
campanian 4051
 
2.5%
chadronian 3871
 
2.4%
ypresian 3476
 
2.1%
Other values (350) 87399
53.6%
2025-01-14T11:34:08.342122image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 228885
15.6%
n 195907
13.4%
i 190767
13.0%
e 105142
 
7.2%
l 96307
 
6.6%
r 75689
 
5.2%
d 61340
 
4.2%
o 52724
 
3.6%
h 47497
 
3.2%
s 40454
 
2.8%
Other values (44) 369454
25.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1300773
88.8%
Uppercase Letter 162483
 
11.1%
Space Separator 895
 
0.1%
Other Punctuation 13
 
< 0.1%
Decimal Number 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 228885
17.6%
n 195907
15.1%
i 190767
14.7%
e 105142
8.1%
l 96307
7.4%
r 75689
 
5.8%
d 61340
 
4.7%
o 52724
 
4.1%
h 47497
 
3.7%
s 40454
 
3.1%
Other values (16) 206061
15.8%
Uppercase Letter
ValueCountFrequency (%)
M 28152
17.3%
C 21480
13.2%
H 20672
12.7%
W 12315
7.6%
B 10522
 
6.5%
O 10358
 
6.4%
T 8937
 
5.5%
E 7395
 
4.6%
A 6493
 
4.0%
L 6455
 
4.0%
Other values (14) 29704
18.3%
Other Punctuation
ValueCountFrequency (%)
/ 12
92.3%
, 1
 
7.7%
Space Separator
ValueCountFrequency (%)
895
100.0%
Decimal Number
ValueCountFrequency (%)
4 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1463256
99.9%
Common 910
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 228885
15.6%
n 195907
13.4%
i 190767
13.0%
e 105142
 
7.2%
l 96307
 
6.6%
r 75689
 
5.2%
d 61340
 
4.2%
o 52724
 
3.6%
h 47497
 
3.2%
s 40454
 
2.8%
Other values (40) 368544
25.2%
Common
ValueCountFrequency (%)
895
98.4%
/ 12
 
1.3%
4 2
 
0.2%
, 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1464166
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 228885
15.6%
n 195907
13.4%
i 190767
13.0%
e 105142
 
7.2%
l 96307
 
6.6%
r 75689
 
5.2%
d 61340
 
4.2%
o 52724
 
3.6%
h 47497
 
3.2%
s 40454
 
2.8%
Other values (44) 369454
25.2%
Distinct35
Distinct (%)1.5%
Missing722133
Missing (%)99.7%
Memory size5.5 MiB
2025-01-14T11:34:08.423671image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length13
Median length8
Mean length8.232
Min length4

Characters and Unicode

Total characters19551
Distinct characters38
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)0.2%

Sample

1st rowGivetian
2nd rowTuronian
3rd rowGelasian
4th rowGelasian
5th rowGelasian
ValueCountFrequency (%)
lutetian 829
34.9%
zanclean 319
 
13.4%
tortonian 217
 
9.1%
gelasian 200
 
8.4%
maastrichtian 105
 
4.4%
late 98
 
4.1%
messinian 78
 
3.3%
thanetian 78
 
3.3%
ypresian 60
 
2.5%
langhian 58
 
2.4%
Other values (25) 333
14.0%
2025-01-14T11:34:08.554313image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 3358
17.2%
n 3107
15.9%
t 2287
11.7%
i 2268
11.6%
e 1838
9.4%
L 1015
 
5.2%
u 862
 
4.4%
l 662
 
3.4%
o 553
 
2.8%
s 534
 
2.7%
Other values (28) 3067
15.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 17176
87.9%
Uppercase Letter 2375
 
12.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 3358
19.6%
n 3107
18.1%
t 2287
13.3%
i 2268
13.2%
e 1838
10.7%
u 862
 
5.0%
l 662
 
3.9%
o 553
 
3.2%
s 534
 
3.1%
r 515
 
3.0%
Other values (13) 1192
 
6.9%
Uppercase Letter
ValueCountFrequency (%)
L 1015
42.7%
Z 319
 
13.4%
T 297
 
12.5%
G 223
 
9.4%
M 196
 
8.3%
E 90
 
3.8%
Y 60
 
2.5%
P 53
 
2.2%
C 50
 
2.1%
B 32
 
1.3%
Other values (5) 40
 
1.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 19551
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 3358
17.2%
n 3107
15.9%
t 2287
11.7%
i 2268
11.6%
e 1838
9.4%
L 1015
 
5.2%
u 862
 
4.4%
l 662
 
3.4%
o 553
 
2.8%
s 534
 
2.7%
Other values (28) 3067
15.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 19551
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 3358
17.2%
n 3107
15.9%
t 2287
11.7%
i 2268
11.6%
e 1838
9.4%
L 1015
 
5.2%
u 862
 
4.4%
l 662
 
3.4%
o 553
 
2.8%
s 534
 
2.7%
Other values (28) 3067
15.7%

group
Text

Missing 

Distinct557
Distinct (%)0.6%
Missing633218
Missing (%)87.4%
Memory size5.5 MiB
2025-01-14T11:34:08.737823image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length29
Median length28
Mean length14.80891664
Min length1

Characters and Unicode

Total characters1351906
Distinct characters57
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique146 ?
Unique (%)0.2%

Sample

1st rowStar Peak Group
2nd rowChesapeake Group
3rd rowKeokuk Group
4th rowChesapeake Group
5th rowChesapeake Group
ValueCountFrequency (%)
group 90331
46.7%
chesapeake 38410
19.9%
river 7802
 
4.0%
white 5751
 
3.0%
selma 3439
 
1.8%
kewanee 2702
 
1.4%
hamilton 2337
 
1.2%
osage 2256
 
1.2%
washita 1421
 
0.7%
pamunkey 1419
 
0.7%
Other values (577) 37508
19.4%
2025-01-14T11:34:08.998189image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 166874
12.3%
p 131366
9.7%
a 118438
 
8.8%
r 115845
 
8.6%
o 113583
 
8.4%
102086
 
7.6%
u 98547
 
7.3%
G 90741
 
6.7%
s 54633
 
4.0%
h 50628
 
3.7%
Other values (47) 309165
22.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1056168
78.1%
Uppercase Letter 193474
 
14.3%
Space Separator 102086
 
7.6%
Other Punctuation 124
 
< 0.1%
Open Punctuation 21
 
< 0.1%
Close Punctuation 21
 
< 0.1%
Dash Punctuation 12
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 166874
15.8%
p 131366
12.4%
a 118438
11.2%
r 115845
11.0%
o 113583
10.8%
u 98547
9.3%
s 54633
 
5.2%
h 50628
 
4.8%
k 45139
 
4.3%
i 34291
 
3.2%
Other values (16) 126824
12.0%
Uppercase Letter
ValueCountFrequency (%)
G 90741
46.9%
C 43143
22.3%
R 9045
 
4.7%
W 8105
 
4.2%
S 6248
 
3.2%
M 4589
 
2.4%
P 4340
 
2.2%
K 3671
 
1.9%
O 3592
 
1.9%
H 3351
 
1.7%
Other values (15) 16649
 
8.6%
Other Punctuation
ValueCountFrequency (%)
. 88
71.0%
, 36
29.0%
Space Separator
ValueCountFrequency (%)
102086
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1249642
92.4%
Common 102264
 
7.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 166874
13.4%
p 131366
10.5%
a 118438
9.5%
r 115845
9.3%
o 113583
9.1%
u 98547
 
7.9%
G 90741
 
7.3%
s 54633
 
4.4%
h 50628
 
4.1%
k 45139
 
3.6%
Other values (41) 263848
21.1%
Common
ValueCountFrequency (%)
102086
99.8%
. 88
 
0.1%
, 36
 
< 0.1%
( 21
 
< 0.1%
) 21
 
< 0.1%
- 12
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1351906
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 166874
12.3%
p 131366
9.7%
a 118438
 
8.8%
r 115845
 
8.6%
o 113583
 
8.4%
102086
 
7.6%
u 98547
 
7.3%
G 90741
 
6.7%
s 54633
 
4.0%
h 50628
 
3.7%
Other values (47) 309165
22.9%

formation
Text

Missing 

Distinct5419
Distinct (%)1.5%
Missing365706
Missing (%)50.5%
Memory size5.5 MiB
2025-01-14T11:34:09.188654image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length46
Median length38
Mean length11.49027319
Min length3

Characters and Unicode

Total characters4122733
Distinct characters66
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1482 ?
Unique (%)0.4%

Sample

1st rowPrida Fm
2nd rowYorktown Fm
3rd rowSkinner Ranch Fm
4th rowSan Pedro Fm
5th rowGrande Greve Fm
ValueCountFrequency (%)
fm 259134
32.0%
river 44301
 
5.5%
ls 39737
 
4.9%
stephen 31376
 
3.9%
green 29207
 
3.6%
yorktown 23754
 
2.9%
unknown 18762
 
2.3%
sh 17735
 
2.2%
pungo 10262
 
1.3%
canyon 8111
 
1.0%
Other values (4425) 326422
40.4%
2025-01-14T11:34:09.559252image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
449999
 
10.9%
e 361227
 
8.8%
n 317355
 
7.7%
m 288475
 
7.0%
F 271104
 
6.6%
r 245377
 
6.0%
o 238913
 
5.8%
a 212844
 
5.2%
i 166070
 
4.0%
t 160119
 
3.9%
Other values (56) 1411250
34.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2858690
69.3%
Uppercase Letter 809683
 
19.6%
Space Separator 449999
 
10.9%
Other Punctuation 3867
 
0.1%
Decimal Number 156
 
< 0.1%
Open Punctuation 135
 
< 0.1%
Close Punctuation 134
 
< 0.1%
Dash Punctuation 69
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 361227
12.6%
n 317355
11.1%
m 288475
10.1%
r 245377
 
8.6%
o 238913
 
8.4%
a 212844
 
7.4%
i 166070
 
5.8%
t 160119
 
5.6%
l 128749
 
4.5%
s 112733
 
3.9%
Other values (16) 626828
21.9%
Uppercase Letter
ValueCountFrequency (%)
F 271104
33.5%
S 78359
 
9.7%
R 63222
 
7.8%
L 61354
 
7.6%
C 52642
 
6.5%
G 37852
 
4.7%
B 36649
 
4.5%
M 26756
 
3.3%
P 26718
 
3.3%
Y 24537
 
3.0%
Other values (15) 130490
16.1%
Other Punctuation
ValueCountFrequency (%)
. 2426
62.7%
, 703
 
18.2%
? 651
 
16.8%
' 64
 
1.7%
/ 19
 
0.5%
" 4
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 147
94.2%
3 3
 
1.9%
9 2
 
1.3%
2 2
 
1.3%
0 2
 
1.3%
Space Separator
ValueCountFrequency (%)
449999
100.0%
Open Punctuation
ValueCountFrequency (%)
( 135
100.0%
Close Punctuation
ValueCountFrequency (%)
) 134
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 69
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3668373
89.0%
Common 454360
 
11.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 361227
 
9.8%
n 317355
 
8.7%
m 288475
 
7.9%
F 271104
 
7.4%
r 245377
 
6.7%
o 238913
 
6.5%
a 212844
 
5.8%
i 166070
 
4.5%
t 160119
 
4.4%
l 128749
 
3.5%
Other values (41) 1278140
34.8%
Common
ValueCountFrequency (%)
449999
99.0%
. 2426
 
0.5%
, 703
 
0.2%
? 651
 
0.1%
1 147
 
< 0.1%
( 135
 
< 0.1%
) 134
 
< 0.1%
- 69
 
< 0.1%
' 64
 
< 0.1%
/ 19
 
< 0.1%
Other values (5) 13
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4122733
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
449999
 
10.9%
e 361227
 
8.8%
n 317355
 
7.7%
m 288475
 
7.0%
F 271104
 
6.6%
r 245377
 
6.0%
o 238913
 
5.8%
a 212844
 
5.2%
i 166070
 
4.0%
t 160119
 
3.9%
Other values (56) 1411250
34.2%

member
Text

Missing 

Distinct1626
Distinct (%)2.0%
Missing643191
Missing (%)88.8%
Memory size5.5 MiB
2025-01-14T11:34:09.748969image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length31
Median length30
Mean length13.99831524
Min length1

Characters and Unicode

Total characters1138301
Distinct characters70
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique471 ?
Unique (%)0.6%

Sample

1st rowFossil Hill Mbr
2nd rowDecie Ranch Mbr
3rd rowMillersburg Mbr
4th rowThin-Bedded Zone Of Udden
5th rowBurgess Sh Mbr
ValueCountFrequency (%)
mbr 79698
34.1%
sh 36967
15.8%
burgess 30811
 
13.2%
ls 6535
 
2.8%
creek 4230
 
1.8%
sunken 3525
 
1.5%
meadow 3525
 
1.5%
ranch 3361
 
1.4%
francis 2603
 
1.1%
b 2492
 
1.1%
Other values (1500) 60135
25.7%
2025-01-14T11:34:10.010421image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
152565
13.4%
r 138201
12.1%
M 87327
 
7.7%
s 86157
 
7.6%
b 84523
 
7.4%
e 79157
 
7.0%
h 47967
 
4.2%
S 46866
 
4.1%
u 42615
 
3.7%
a 41195
 
3.6%
Other values (60) 331728
29.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 749978
65.9%
Uppercase Letter 232978
 
20.5%
Space Separator 152565
 
13.4%
Decimal Number 2131
 
0.2%
Other Punctuation 324
 
< 0.1%
Dash Punctuation 290
 
< 0.1%
Open Punctuation 17
 
< 0.1%
Close Punctuation 17
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 138201
18.4%
s 86157
11.5%
b 84523
11.3%
e 79157
10.6%
h 47967
 
6.4%
u 42615
 
5.7%
a 41195
 
5.5%
g 38517
 
5.1%
n 36464
 
4.9%
i 27554
 
3.7%
Other values (16) 127628
17.0%
Uppercase Letter
ValueCountFrequency (%)
M 87327
37.5%
S 46866
20.1%
B 39596
17.0%
C 10761
 
4.6%
L 9429
 
4.0%
R 5451
 
2.3%
F 4926
 
2.1%
P 4323
 
1.9%
G 4164
 
1.8%
W 4116
 
1.8%
Other values (15) 16019
 
6.9%
Decimal Number
ValueCountFrequency (%)
1 858
40.3%
2 337
 
15.8%
3 289
 
13.6%
4 247
 
11.6%
5 130
 
6.1%
0 124
 
5.8%
6 102
 
4.8%
7 24
 
1.1%
9 16
 
0.8%
8 4
 
0.2%
Other Punctuation
ValueCountFrequency (%)
, 131
40.4%
. 128
39.5%
? 64
19.8%
' 1
 
0.3%
Space Separator
ValueCountFrequency (%)
152565
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 290
100.0%
Open Punctuation
ValueCountFrequency (%)
( 17
100.0%
Close Punctuation
ValueCountFrequency (%)
) 17
100.0%
Math Symbol
ValueCountFrequency (%)
= 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 982956
86.4%
Common 155345
 
13.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 138201
14.1%
M 87327
 
8.9%
s 86157
 
8.8%
b 84523
 
8.6%
e 79157
 
8.1%
h 47967
 
4.9%
S 46866
 
4.8%
u 42615
 
4.3%
a 41195
 
4.2%
B 39596
 
4.0%
Other values (41) 289352
29.4%
Common
ValueCountFrequency (%)
152565
98.2%
1 858
 
0.6%
2 337
 
0.2%
- 290
 
0.2%
3 289
 
0.2%
4 247
 
0.2%
, 131
 
0.1%
5 130
 
0.1%
. 128
 
0.1%
0 124
 
0.1%
Other values (9) 246
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1138301
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
152565
13.4%
r 138201
12.1%
M 87327
 
7.7%
s 86157
 
7.6%
b 84523
 
7.4%
e 79157
 
7.0%
h 47967
 
4.2%
S 46866
 
4.1%
u 42615
 
3.7%
a 41195
 
3.6%
Other values (60) 331728
29.1%

typeStatus
Text

Missing 

Distinct57
Distinct (%)< 0.1%
Missing581882
Missing (%)80.3%
Memory size5.5 MiB
2025-01-14T11:34:10.073237image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length32
Median length8
Mean length7.816414959
Min length4

Characters and Unicode

Total characters1114824
Distinct characters25
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)< 0.1%

Sample

1st rowParatype
2nd rowParatype
3rd rowParatype
4th rowType
5th rowHolotype
ValueCountFrequency (%)
paratype 74620
52.2%
holotype 34727
24.3%
syntype 19596
 
13.7%
type 7957
 
5.6%
paralectotype 2999
 
2.1%
lectotype 1087
 
0.8%
plastoholotype 595
 
0.4%
plastotype 390
 
0.3%
plastoparatype 282
 
0.2%
plastosyntype 253
 
0.2%
Other values (12) 325
 
0.2%
2025-01-14T11:34:10.192363image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
y 162651
14.6%
a 157416
14.1%
e 147041
13.2%
p 143090
12.8%
t 140517
12.6%
P 79203
7.1%
r 77963
7.0%
o 76542
6.9%
l 39911
 
3.6%
H 34727
 
3.1%
Other values (15) 55763
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 971641
87.2%
Uppercase Letter 142831
 
12.8%
Space Separator 205
 
< 0.1%
Other Punctuation 147
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
y 162651
16.7%
a 157416
16.2%
e 147041
15.1%
p 143090
14.7%
t 140517
14.5%
r 77963
8.0%
o 76542
7.9%
l 39911
 
4.1%
n 19880
 
2.0%
c 4119
 
0.4%
Other values (3) 2511
 
0.3%
Uppercase Letter
ValueCountFrequency (%)
P 79203
55.5%
H 34727
24.3%
S 19625
 
13.7%
T 7957
 
5.6%
L 1087
 
0.8%
N 143
 
0.1%
O 29
 
< 0.1%
I 28
 
< 0.1%
M 19
 
< 0.1%
C 13
 
< 0.1%
Space Separator
ValueCountFrequency (%)
205
100.0%
Other Punctuation
ValueCountFrequency (%)
; 147
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1114472
> 99.9%
Common 352
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
y 162651
14.6%
a 157416
14.1%
e 147041
13.2%
p 143090
12.8%
t 140517
12.6%
P 79203
7.1%
r 77963
7.0%
o 76542
6.9%
l 39911
 
3.6%
H 34727
 
3.1%
Other values (13) 55411
 
5.0%
Common
ValueCountFrequency (%)
205
58.2%
; 147
41.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1114824
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
y 162651
14.6%
a 157416
14.1%
e 147041
13.2%
p 143090
12.8%
t 140517
12.6%
P 79203
7.1%
r 77963
7.0%
o 76542
6.9%
l 39911
 
3.6%
H 34727
 
3.1%
Other values (15) 55763
 
5.0%

identifiedBy
Text

Missing 

Distinct2463
Distinct (%)1.2%
Missing521981
Missing (%)72.0%
Memory size5.5 MiB
2025-01-14T11:34:10.379635image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length147
Median length124
Mean length22.47668212
Min length2

Characters and Unicode

Total characters4552135
Distinct characters68
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique535 ?
Unique (%)0.3%

Sample

1st rowSilberling; Nichols
2nd rowVaughan
3rd rowHarper; Boucot
4th rowSaid; Barakat, M. G.
5th rowSmith
ValueCountFrequency (%)
united 21468
 
3.2%
states 21082
 
3.2%
of 20281
 
3.1%
museum 15734
 
2.4%
helen 15316
 
2.3%
12006
 
1.8%
natural 11887
 
1.8%
history 11620
 
1.8%
institution 11572
 
1.7%
smithsonian 11571
 
1.7%
Other values (2466) 510240
77.0%
2025-01-14T11:34:10.658793image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
460250
 
10.1%
e 280098
 
6.2%
o 272102
 
6.0%
a 259642
 
5.7%
n 241275
 
5.3%
t 230888
 
5.1%
r 226036
 
5.0%
i 214007
 
4.7%
l 181066
 
4.0%
s 174306
 
3.8%
Other values (58) 2012465
44.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2806351
61.6%
Uppercase Letter 908175
 
20.0%
Space Separator 460250
 
10.1%
Other Punctuation 280258
 
6.2%
Close Punctuation 40168
 
0.9%
Open Punctuation 40168
 
0.9%
Dash Punctuation 16765
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 280098
10.0%
o 272102
9.7%
a 259642
9.3%
n 241275
 
8.6%
t 230888
 
8.2%
r 226036
 
8.1%
i 214007
 
7.6%
l 181066
 
6.5%
s 174306
 
6.2%
u 121224
 
4.3%
Other values (22) 605707
21.6%
Uppercase Letter
ValueCountFrequency (%)
S 117932
 
13.0%
T 78022
 
8.6%
A 60143
 
6.6%
N 59104
 
6.5%
C 57622
 
6.3%
E 56100
 
6.2%
I 46266
 
5.1%
D 44046
 
4.8%
H 42705
 
4.7%
U 40270
 
4.4%
Other values (16) 305965
33.7%
Other Punctuation
ValueCountFrequency (%)
, 138675
49.5%
. 77116
27.5%
; 64257
22.9%
/ 177
 
0.1%
' 23
 
< 0.1%
& 10
 
< 0.1%
Space Separator
ValueCountFrequency (%)
460250
100.0%
Close Punctuation
ValueCountFrequency (%)
) 40168
100.0%
Open Punctuation
ValueCountFrequency (%)
( 40168
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 16765
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3714526
81.6%
Common 837609
 
18.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 280098
 
7.5%
o 272102
 
7.3%
a 259642
 
7.0%
n 241275
 
6.5%
t 230888
 
6.2%
r 226036
 
6.1%
i 214007
 
5.8%
l 181066
 
4.9%
s 174306
 
4.7%
u 121224
 
3.3%
Other values (48) 1513882
40.8%
Common
ValueCountFrequency (%)
460250
54.9%
, 138675
 
16.6%
. 77116
 
9.2%
; 64257
 
7.7%
) 40168
 
4.8%
( 40168
 
4.8%
- 16765
 
2.0%
/ 177
 
< 0.1%
' 23
 
< 0.1%
& 10
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4550350
> 99.9%
None 1785
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
460250
 
10.1%
e 280098
 
6.2%
o 272102
 
6.0%
a 259642
 
5.7%
n 241275
 
5.3%
t 230888
 
5.1%
r 226036
 
5.0%
i 214007
 
4.7%
l 181066
 
4.0%
s 174306
 
3.8%
Other values (52) 2010680
44.2%
None
ValueCountFrequency (%)
ñ 1143
64.0%
ý 251
 
14.1%
š 251
 
14.1%
ö 138
 
7.7%
ú 1
 
0.1%
í 1
 
0.1%

scientificName
Text

Missing 

Distinct97401
Distinct (%)17.6%
Missing171332
Missing (%)23.6%
Memory size5.5 MiB
2025-01-14T11:34:10.871003image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length62
Median length56
Mean length18.07695742
Min length5

Characters and Unicode

Total characters9999739
Distinct characters72
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44766 ?
Unique (%)8.1%

Sample

1st rowDamaliscus lunatus
2nd rowAcrochordiceras hyatti
3rd rowDiscocyclina (Asterocyclina) sculpturata
4th rowOdontaspis cuspidata
5th rowEnteletes rotundobesus
ValueCountFrequency (%)
sp 136960
 
12.1%
genus 56232
 
5.0%
insecta 16851
 
1.5%
splendens 12400
 
1.1%
marrella 12281
 
1.1%
pterodroma 7305
 
0.6%
var 6498
 
0.6%
callophoca 3770
 
0.3%
isurus 3463
 
0.3%
ostracoda 3391
 
0.3%
Other values (53913) 873954
77.1%
2025-01-14T11:34:11.167941image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 1021294
 
10.2%
s 909134
 
9.1%
i 819278
 
8.2%
e 762530
 
7.6%
o 610330
 
6.1%
r 609311
 
6.1%
n 592254
 
5.9%
579929
 
5.8%
l 537519
 
5.4%
u 466436
 
4.7%
Other values (62) 3091724
30.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 8787040
87.9%
Space Separator 579929
 
5.8%
Uppercase Letter 575487
 
5.8%
Close Punctuation 22326
 
0.2%
Open Punctuation 22314
 
0.2%
Other Punctuation 10186
 
0.1%
Decimal Number 1938
 
< 0.1%
Dash Punctuation 518
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1021294
11.6%
s 909134
10.3%
i 819278
9.3%
e 762530
 
8.7%
o 610330
 
6.9%
r 609311
 
6.9%
n 592254
 
6.7%
l 537519
 
6.1%
u 466436
 
5.3%
t 465047
 
5.3%
Other values (16) 1993907
22.7%
Uppercase Letter
ValueCountFrequency (%)
G 79813
13.9%
P 69195
12.0%
C 60147
10.5%
A 39927
 
6.9%
M 39806
 
6.9%
S 35677
 
6.2%
B 27831
 
4.8%
H 26616
 
4.6%
T 26590
 
4.6%
I 25413
 
4.4%
Other values (16) 144472
25.1%
Decimal Number
ValueCountFrequency (%)
1 962
49.6%
2 543
28.0%
3 206
 
10.6%
4 92
 
4.7%
5 67
 
3.5%
6 38
 
2.0%
7 19
 
1.0%
8 5
 
0.3%
0 4
 
0.2%
9 2
 
0.1%
Other Punctuation
ValueCountFrequency (%)
. 10146
99.6%
' 21
 
0.2%
? 13
 
0.1%
* 5
 
< 0.1%
# 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
579929
100.0%
Close Punctuation
ValueCountFrequency (%)
) 22326
100.0%
Open Punctuation
ValueCountFrequency (%)
( 22314
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 518
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9362527
93.6%
Common 637212
 
6.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1021294
 
10.9%
s 909134
 
9.7%
i 819278
 
8.8%
e 762530
 
8.1%
o 610330
 
6.5%
r 609311
 
6.5%
n 592254
 
6.3%
l 537519
 
5.7%
u 466436
 
5.0%
t 465047
 
5.0%
Other values (42) 2569394
27.4%
Common
ValueCountFrequency (%)
579929
91.0%
) 22326
 
3.5%
( 22314
 
3.5%
. 10146
 
1.6%
1 962
 
0.2%
2 543
 
0.1%
- 518
 
0.1%
3 206
 
< 0.1%
4 92
 
< 0.1%
5 67
 
< 0.1%
Other values (10) 109
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9999739
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 1021294
 
10.2%
s 909134
 
9.1%
i 819278
 
8.2%
e 762530
 
7.6%
o 610330
 
6.1%
r 609311
 
6.1%
n 592254
 
5.9%
579929
 
5.8%
l 537519
 
5.4%
u 466436
 
4.7%
Other values (62) 3091724
30.9%

higherClassification
Text

Missing 

Distinct3844
Distinct (%)0.7%
Missing172643
Missing (%)23.8%
Memory size5.5 MiB
2025-01-14T11:34:11.347422image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length141
Median length123
Mean length59.08444638
Min length5

Characters and Unicode

Total characters32606638
Distinct characters61
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique743 ?
Unique (%)0.1%

Sample

1st rowAnimalia, Chordata, Vertebrata, Mammalia, Eutheria, Laurasiatheria, Artiodactyla, Ruminatia, Bovidae
2nd rowAnimalia, Mollusca, Cephalopoda, Ammonoidea
3rd rowChromista, Foraminifera, Globothalamea, Rotaliida, Discocyclinidae
4th rowAnimalia, Chordata, Vertebrata, Pisces, Chondrichthyes, Elasmobranchii, Galeomorphii, Lamniformes, Odontaspididae
5th rowAnimalia, Brachiopoda, Rhynchonellata, Orthida, Enteletidae
ValueCountFrequency (%)
animalia 448323
 
15.7%
chordata 148700
 
5.2%
vertebrata 148618
 
5.2%
arthropoda 100318
 
3.5%
mollusca 69025
 
2.4%
brachiopoda 66748
 
2.3%
foraminifera 66301
 
2.3%
chromista 65999
 
2.3%
mammalia 60027
 
2.1%
eutheria 57586
 
2.0%
Other values (3834) 1620986
56.8%
2025-01-14T11:34:11.602635image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 4706865
14.4%
i 3184420
 
9.8%
2300766
 
7.1%
, 2260526
 
6.9%
o 2052009
 
6.3%
r 2005114
 
6.1%
e 1809015
 
5.5%
t 1671086
 
5.1%
l 1501858
 
4.6%
n 1400746
 
4.3%
Other values (51) 9714233
29.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 25197474
77.3%
Uppercase Letter 2811914
 
8.6%
Space Separator 2300766
 
7.1%
Other Punctuation 2295928
 
7.0%
Decimal Number 471
 
< 0.1%
Open Punctuation 42
 
< 0.1%
Close Punctuation 42
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 4706865
18.7%
i 3184420
12.6%
o 2052009
8.1%
r 2005114
8.0%
e 1809015
 
7.2%
t 1671086
 
6.6%
l 1501858
 
6.0%
n 1400746
 
5.6%
d 1257138
 
5.0%
m 1113235
 
4.4%
Other values (16) 4495988
17.8%
Uppercase Letter
ValueCountFrequency (%)
A 662527
23.6%
C 427513
15.2%
P 199516
 
7.1%
M 161377
 
5.7%
V 161299
 
5.7%
S 144831
 
5.2%
E 143204
 
5.1%
R 141162
 
5.0%
B 123534
 
4.4%
G 116236
 
4.1%
Other values (16) 530715
18.9%
Other Punctuation
ValueCountFrequency (%)
, 2260526
98.5%
. 35391
 
1.5%
" 8
 
< 0.1%
? 3
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2300766
100.0%
Decimal Number
ValueCountFrequency (%)
0 471
100.0%
Open Punctuation
ValueCountFrequency (%)
( 42
100.0%
Close Punctuation
ValueCountFrequency (%)
) 42
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 28009388
85.9%
Common 4597250
 
14.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 4706865
16.8%
i 3184420
11.4%
o 2052009
 
7.3%
r 2005114
 
7.2%
e 1809015
 
6.5%
t 1671086
 
6.0%
l 1501858
 
5.4%
n 1400746
 
5.0%
d 1257138
 
4.5%
m 1113235
 
4.0%
Other values (42) 7307902
26.1%
Common
ValueCountFrequency (%)
2300766
50.0%
, 2260526
49.2%
. 35391
 
0.8%
0 471
 
< 0.1%
( 42
 
< 0.1%
) 42
 
< 0.1%
" 8
 
< 0.1%
? 3
 
< 0.1%
- 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 32606638
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 4706865
14.4%
i 3184420
 
9.8%
2300766
 
7.1%
, 2260526
 
6.9%
o 2052009
 
6.3%
r 2005114
 
6.1%
e 1809015
 
5.5%
t 1671086
 
5.1%
l 1501858
 
4.6%
n 1400746
 
4.3%
Other values (51) 9714233
29.8%

kingdom
Text

Missing 

Distinct9
Distinct (%)< 0.1%
Missing172847
Missing (%)23.9%
Memory size5.5 MiB
2025-01-14T11:34:11.664492image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length14
Median length8
Mean length8.052434375
Min length5

Characters and Unicode

Total characters4442214
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowAnimalia
2nd rowAnimalia
3rd rowChromista
4th rowAnimalia
5th rowAnimalia
ValueCountFrequency (%)
animalia 448322
81.3%
chromista 65985
 
12.0%
plantae 37205
 
6.7%
protoctista 66
 
< 0.1%
protozoa 44
 
< 0.1%
biota 28
 
< 0.1%
incertae 5
 
< 0.1%
sedis 5
 
< 0.1%
bacteria 5
 
< 0.1%
arthropoda 1
 
< 0.1%
2025-01-14T11:34:11.767622image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 1037193
23.3%
i 962733
21.7%
m 514307
11.6%
n 485532
10.9%
l 485527
10.9%
A 448323
10.1%
t 103471
 
2.3%
o 66279
 
1.5%
r 66107
 
1.5%
s 66061
 
1.5%
Other values (11) 206681
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3890548
87.6%
Uppercase Letter 551661
 
12.4%
Space Separator 5
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1037193
26.7%
i 962733
24.7%
m 514307
13.2%
n 485532
12.5%
l 485527
12.5%
t 103471
 
2.7%
o 66279
 
1.7%
r 66107
 
1.7%
s 66061
 
1.7%
h 65986
 
1.7%
Other values (5) 37352
 
1.0%
Uppercase Letter
ValueCountFrequency (%)
A 448323
81.3%
C 65985
 
12.0%
P 37315
 
6.8%
B 33
 
< 0.1%
I 5
 
< 0.1%
Space Separator
ValueCountFrequency (%)
5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4442209
> 99.9%
Common 5
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1037193
23.3%
i 962733
21.7%
m 514307
11.6%
n 485532
10.9%
l 485527
10.9%
A 448323
10.1%
t 103471
 
2.3%
o 66279
 
1.5%
r 66107
 
1.5%
s 66061
 
1.5%
Other values (10) 206676
 
4.7%
Common
ValueCountFrequency (%)
5
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4442214
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 1037193
23.3%
i 962733
21.7%
m 514307
11.6%
n 485532
10.9%
l 485527
10.9%
A 448323
10.1%
t 103471
 
2.3%
o 66279
 
1.5%
r 66107
 
1.5%
s 66061
 
1.5%
Other values (11) 206681
 
4.7%

phylum
Text

Missing 

Distinct34
Distinct (%)< 0.1%
Missing211856
Missing (%)29.2%
Memory size5.5 MiB
2025-01-14T11:34:11.828637image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length17
Median length14
Mean length9.567853047
Min length5

Characters and Unicode

Total characters4904979
Distinct characters34
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowChordata
2nd rowMollusca
3rd rowForaminifera
4th rowChordata
5th rowBrachiopoda
ValueCountFrequency (%)
chordata 148700
29.0%
arthropoda 100304
19.5%
mollusca 69025
13.4%
brachiopoda 66748
13.0%
foraminifera 65986
12.9%
echinodermata 26599
 
5.2%
bryozoa 12874
 
2.5%
cnidaria 7243
 
1.4%
protozoa 4080
 
0.8%
porifera 2897
 
0.6%
Other values (27) 8947
 
1.7%
2025-01-14T11:34:11.948822image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 832296
17.0%
o 688644
14.0%
r 609931
12.4%
d 357317
 
7.3%
h 344816
 
7.0%
t 283255
 
5.8%
i 252208
 
5.1%
p 168801
 
3.4%
c 165860
 
3.4%
C 156159
 
3.2%
Other values (24) 1045692
21.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4391575
89.5%
Uppercase Letter 512652
 
10.5%
Space Separator 751
 
< 0.1%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 832296
19.0%
o 688644
15.7%
r 609931
13.9%
d 357317
8.1%
h 344816
7.9%
t 283255
 
6.4%
i 252208
 
5.7%
p 168801
 
3.8%
c 165860
 
3.8%
l 143009
 
3.3%
Other values (10) 545438
12.4%
Uppercase Letter
ValueCountFrequency (%)
C 156159
30.5%
A 103373
20.2%
B 79622
15.5%
M 69031
13.5%
F 65986
12.9%
E 26614
 
5.2%
P 8593
 
1.7%
H 2435
 
0.5%
I 754
 
0.1%
G 66
 
< 0.1%
Other values (2) 19
 
< 0.1%
Space Separator
ValueCountFrequency (%)
751
100.0%
Other Punctuation
ValueCountFrequency (%)
. 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4904227
> 99.9%
Common 752
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 832296
17.0%
o 688644
14.0%
r 609931
12.4%
d 357317
 
7.3%
h 344816
 
7.0%
t 283255
 
5.8%
i 252208
 
5.1%
p 168801
 
3.4%
c 165860
 
3.4%
C 156159
 
3.2%
Other values (22) 1044940
21.3%
Common
ValueCountFrequency (%)
751
99.9%
. 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4904979
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 832296
17.0%
o 688644
14.0%
r 609931
12.4%
d 357317
 
7.3%
h 344816
 
7.0%
t 283255
 
5.8%
i 252208
 
5.1%
p 168801
 
3.4%
c 165860
 
3.4%
C 156159
 
3.2%
Other values (24) 1045692
21.3%

class
Text

Missing 

Distinct145
Distinct (%)< 0.1%
Missing235611
Missing (%)32.5%
Memory size5.5 MiB
2025-01-14T11:34:12.082163image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length27
Median length19
Mean length9.967651673
Min length4

Characters and Unicode

Total characters4873155
Distinct characters49
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st rowMammalia
2nd rowCephalopoda
3rd rowGlobothalamea
4th rowChondrichthyes
5th rowRhynchonellata
ValueCountFrequency (%)
mammalia 60027
 
12.2%
globothalamea 41779
 
8.5%
rhynchonellata 39023
 
7.9%
aves 34583
 
7.0%
insecta 29284
 
6.0%
chondrichthyes 26607
 
5.4%
gastropoda 24466
 
5.0%
ostracoda 24047
 
4.9%
trilobita 22871
 
4.7%
bivalvia 22291
 
4.5%
Other values (133) 165921
33.8%
2025-01-14T11:34:12.281444image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 859113
17.6%
o 453975
 
9.3%
t 367169
 
7.5%
l 337501
 
6.9%
i 301652
 
6.2%
e 293993
 
6.0%
h 287732
 
5.9%
n 212707
 
4.4%
s 207229
 
4.3%
m 199854
 
4.1%
Other values (39) 1352230
27.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4382031
89.9%
Uppercase Letter 488897
 
10.0%
Space Separator 2002
 
< 0.1%
Other Punctuation 179
 
< 0.1%
Open Punctuation 23
 
< 0.1%
Close Punctuation 23
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 859113
19.6%
o 453975
10.4%
t 367169
8.4%
l 337501
 
7.7%
i 301652
 
6.9%
e 293993
 
6.7%
h 287732
 
6.6%
n 212707
 
4.9%
s 207229
 
4.7%
m 199854
 
4.6%
Other values (14) 861106
19.7%
Uppercase Letter
ValueCountFrequency (%)
C 78641
16.1%
G 68696
14.1%
M 63598
13.0%
R 49366
10.1%
A 43821
9.0%
O 34647
7.1%
T 31644
6.5%
I 29906
 
6.1%
B 26753
 
5.5%
S 20027
 
4.1%
Other values (11) 41798
8.5%
Space Separator
ValueCountFrequency (%)
2002
100.0%
Other Punctuation
ValueCountFrequency (%)
. 179
100.0%
Open Punctuation
ValueCountFrequency (%)
( 23
100.0%
Close Punctuation
ValueCountFrequency (%)
) 23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4870928
> 99.9%
Common 2227
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 859113
17.6%
o 453975
 
9.3%
t 367169
 
7.5%
l 337501
 
6.9%
i 301652
 
6.2%
e 293993
 
6.0%
h 287732
 
5.9%
n 212707
 
4.4%
s 207229
 
4.3%
m 199854
 
4.1%
Other values (35) 1350003
27.7%
Common
ValueCountFrequency (%)
2002
89.9%
. 179
 
8.0%
( 23
 
1.0%
) 23
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4873155
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 859113
17.6%
o 453975
 
9.3%
t 367169
 
7.5%
l 337501
 
6.9%
i 301652
 
6.2%
e 293993
 
6.0%
h 287732
 
5.9%
n 212707
 
4.4%
s 207229
 
4.3%
m 199854
 
4.1%
Other values (39) 1352230
27.7%

order
Text

Missing 

Distinct552
Distinct (%)0.2%
Missing400004
Missing (%)55.2%
Memory size5.5 MiB
2025-01-14T11:34:12.433908image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length28
Median length22
Mean length11.13181656
Min length1

Characters and Unicode

Total characters3612319
Distinct characters54
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique66 ?
Unique (%)< 0.1%

Sample

1st rowArtiodactyla
2nd rowAmmonoidea
3rd rowRotaliida
4th rowLamniformes
5th rowOrthida
ValueCountFrequency (%)
rotaliida 32318
 
9.7%
lamniformes 12411
 
3.7%
spiriferida 11138
 
3.3%
cetacea 10502
 
3.1%
productida 10020
 
3.0%
procellariiformes 9895
 
3.0%
ammonoidea 9257
 
2.8%
order 9090
 
2.7%
artiodactyla 8886
 
2.7%
terebratulida 8672
 
2.6%
Other values (536) 212022
63.4%
2025-01-14T11:34:12.651038image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 454969
12.6%
a 442612
12.3%
r 320973
 
8.9%
o 301998
 
8.4%
e 264934
 
7.3%
d 249362
 
6.9%
t 203578
 
5.6%
l 161146
 
4.5%
s 140573
 
3.9%
n 136028
 
3.8%
Other values (44) 936146
25.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3269472
90.5%
Uppercase Letter 324156
 
9.0%
Space Separator 9707
 
0.3%
Other Punctuation 8600
 
0.2%
Decimal Number 348
 
< 0.1%
Open Punctuation 18
 
< 0.1%
Close Punctuation 18
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 454969
13.9%
a 442612
13.5%
r 320973
9.8%
o 301998
9.2%
e 264934
8.1%
d 249362
7.6%
t 203578
 
6.2%
l 161146
 
4.9%
s 140573
 
4.3%
n 136028
 
4.2%
Other values (16) 593299
18.1%
Uppercase Letter
ValueCountFrequency (%)
P 57757
17.8%
R 46961
14.5%
C 44552
13.7%
A 31461
9.7%
S 29800
9.2%
L 25584
7.9%
O 20494
 
6.3%
T 18568
 
5.7%
M 9915
 
3.1%
D 8635
 
2.7%
Other values (12) 30429
9.4%
Other Punctuation
ValueCountFrequency (%)
. 8599
> 99.9%
, 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
9707
100.0%
Decimal Number
ValueCountFrequency (%)
0 348
100.0%
Open Punctuation
ValueCountFrequency (%)
( 18
100.0%
Close Punctuation
ValueCountFrequency (%)
) 18
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3593628
99.5%
Common 18691
 
0.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 454969
12.7%
a 442612
12.3%
r 320973
 
8.9%
o 301998
 
8.4%
e 264934
 
7.4%
d 249362
 
6.9%
t 203578
 
5.7%
l 161146
 
4.5%
s 140573
 
3.9%
n 136028
 
3.8%
Other values (38) 917455
25.5%
Common
ValueCountFrequency (%)
9707
51.9%
. 8599
46.0%
0 348
 
1.9%
( 18
 
0.1%
) 18
 
0.1%
, 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3612319
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 454969
12.6%
a 442612
12.3%
r 320973
 
8.9%
o 301998
 
8.4%
e 264934
 
7.3%
d 249362
 
6.9%
t 203578
 
5.6%
l 161146
 
4.5%
s 140573
 
3.9%
n 136028
 
3.8%
Other values (44) 936146
25.9%

family
Text

Missing 

Distinct2441
Distinct (%)0.8%
Missing409455
Missing (%)56.5%
Memory size5.5 MiB
2025-01-14T11:34:12.813586image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length31
Median length23
Mean length12.35823496
Min length1

Characters and Unicode

Total characters3893499
Distinct characters60
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique406 ?
Unique (%)0.1%

Sample

1st rowBovidae
2nd rowDiscocyclinidae
3rd rowOdontaspididae
4th rowEnteletidae
5th rowProcellariidae
ValueCountFrequency (%)
family 24920
 
7.3%
indet 24361
 
7.2%
procellariidae 9409
 
2.8%
carcharhinidae 6802
 
2.0%
lamnidae 6398
 
1.9%
anatidae 5246
 
1.5%
equidae 4518
 
1.3%
phocidae 4479
 
1.3%
odontaspididae 3901
 
1.1%
vaginulinidae 3658
 
1.1%
Other values (2428) 246880
72.5%
2025-01-14T11:34:13.045480image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 562017
14.4%
e 500496
12.9%
a 474982
12.2%
d 376670
9.7%
o 212006
 
5.4%
l 211977
 
5.4%
r 188973
 
4.9%
n 186459
 
4.8%
t 179603
 
4.6%
c 107527
 
2.8%
Other values (50) 892789
22.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3528570
90.6%
Uppercase Letter 314926
 
8.1%
Space Separator 25519
 
0.7%
Other Punctuation 24358
 
0.6%
Decimal Number 123
 
< 0.1%
Open Punctuation 1
 
< 0.1%
Close Punctuation 1
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 562017
15.9%
e 500496
14.2%
a 474982
13.5%
d 376670
10.7%
o 212006
 
6.0%
l 211977
 
6.0%
r 188973
 
5.4%
n 186459
 
5.3%
t 179603
 
5.1%
c 107527
 
3.0%
Other values (16) 527860
15.0%
Uppercase Letter
ValueCountFrequency (%)
P 45247
14.4%
C 31894
 
10.1%
F 28184
 
8.9%
S 21920
 
7.0%
A 21504
 
6.8%
L 19683
 
6.3%
E 17192
 
5.5%
T 16574
 
5.3%
O 15064
 
4.8%
H 14161
 
4.5%
Other values (16) 83503
26.5%
Other Punctuation
ValueCountFrequency (%)
. 24354
> 99.9%
? 3
 
< 0.1%
, 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
25519
100.0%
Decimal Number
ValueCountFrequency (%)
0 123
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3843496
98.7%
Common 50003
 
1.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 562017
14.6%
e 500496
13.0%
a 474982
12.4%
d 376670
9.8%
o 212006
 
5.5%
l 211977
 
5.5%
r 188973
 
4.9%
n 186459
 
4.9%
t 179603
 
4.7%
c 107527
 
2.8%
Other values (42) 842786
21.9%
Common
ValueCountFrequency (%)
25519
51.0%
. 24354
48.7%
0 123
 
0.2%
? 3
 
< 0.1%
( 1
 
< 0.1%
) 1
 
< 0.1%
- 1
 
< 0.1%
, 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3893499
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 562017
14.4%
e 500496
12.9%
a 474982
12.2%
d 376670
9.7%
o 212006
 
5.4%
l 211977
 
5.4%
r 188973
 
4.9%
n 186459
 
4.8%
t 179603
 
4.6%
c 107527
 
2.8%
Other values (50) 892789
22.9%

genus
Text

Missing 

Distinct20259
Distinct (%)3.8%
Missing197061
Missing (%)27.2%
Memory size5.5 MiB
2025-01-14T11:34:13.240972image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length29
Median length23
Mean length9.623302436
Min length1

Characters and Unicode

Total characters5075782
Distinct characters58
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5010 ?
Unique (%)0.9%

Sample

1st rowDamaliscus
2nd rowAcrochordiceras
3rd rowDiscocyclina
4th rowOdontaspis
5th rowEnteletes
ValueCountFrequency (%)
genus 56245
 
10.6%
marrella 12281
 
2.3%
pterodroma 7305
 
1.4%
callophoca 3770
 
0.7%
isurus 3463
 
0.7%
physeterula 3029
 
0.6%
carcharhinus 2930
 
0.6%
australca 2250
 
0.4%
thambetochen 2208
 
0.4%
hustedia 2082
 
0.4%
Other values (20248) 432660
81.9%
2025-01-14T11:34:13.509039image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 526234
 
10.4%
e 421801
 
8.3%
i 409475
 
8.1%
o 392073
 
7.7%
s 365990
 
7.2%
r 360745
 
7.1%
l 312289
 
6.2%
n 296798
 
5.8%
u 263865
 
5.2%
t 240334
 
4.7%
Other values (48) 1486178
29.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4547110
89.6%
Uppercase Letter 527447
 
10.4%
Space Separator 776
 
< 0.1%
Other Punctuation 437
 
< 0.1%
Open Punctuation 5
 
< 0.1%
Close Punctuation 5
 
< 0.1%
Decimal Number 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 526234
11.6%
e 421801
9.3%
i 409475
9.0%
o 392073
8.6%
s 365990
 
8.0%
r 360745
 
7.9%
l 312289
 
6.9%
n 296798
 
6.5%
u 263865
 
5.8%
t 240334
 
5.3%
Other values (16) 957506
21.1%
Uppercase Letter
ValueCountFrequency (%)
G 76336
14.5%
P 65817
12.5%
C 58147
11.0%
M 38384
 
7.3%
A 38047
 
7.2%
S 34182
 
6.5%
H 25893
 
4.9%
T 25366
 
4.8%
B 24502
 
4.6%
L 22782
 
4.3%
Other values (16) 117991
22.4%
Other Punctuation
ValueCountFrequency (%)
. 431
98.6%
? 6
 
1.4%
Space Separator
ValueCountFrequency (%)
776
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5
100.0%
Decimal Number
ValueCountFrequency (%)
0 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5074557
> 99.9%
Common 1225
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 526234
 
10.4%
e 421801
 
8.3%
i 409475
 
8.1%
o 392073
 
7.7%
s 365990
 
7.2%
r 360745
 
7.1%
l 312289
 
6.2%
n 296798
 
5.8%
u 263865
 
5.2%
t 240334
 
4.7%
Other values (42) 1484953
29.3%
Common
ValueCountFrequency (%)
776
63.3%
. 431
35.2%
? 6
 
0.5%
( 5
 
0.4%
) 5
 
0.4%
0 2
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5075782
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 526234
 
10.4%
e 421801
 
8.3%
i 409475
 
8.1%
o 392073
 
7.7%
s 365990
 
7.2%
r 360745
 
7.1%
l 312289
 
6.2%
n 296798
 
5.8%
u 263865
 
5.2%
t 240334
 
4.7%
Other values (48) 1486178
29.3%

subgenus
Text

Missing 

Distinct2470
Distinct (%)11.1%
Missing702202
Missing (%)96.9%
Memory size5.5 MiB
2025-01-14T11:34:13.664066image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length20
Median length17
Mean length10.61570878
Min length3

Characters and Unicode

Total characters236794
Distinct characters58
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique735 ?
Unique (%)3.3%

Sample

1st rowAsterocyclina
2nd rowRadiatrypa
3rd rowLaevidentalium
4th rowVacoea
5th rowPhyllonotus
ValueCountFrequency (%)
nephrolepidina 547
 
2.5%
lingulella 440
 
2.0%
lingulepis 430
 
1.9%
lepidocyclina 379
 
1.7%
dyoros 329
 
1.5%
eulepidina 285
 
1.3%
discocyclina 264
 
1.2%
vacoea 243
 
1.1%
chlamys 239
 
1.1%
proporocyclina 214
 
1.0%
Other values (2461) 18944
84.9%
2025-01-14T11:34:13.886749image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 25775
 
10.9%
i 22604
 
9.5%
o 18830
 
8.0%
e 18657
 
7.9%
r 16116
 
6.8%
l 16112
 
6.8%
s 14304
 
6.0%
c 11983
 
5.1%
t 11285
 
4.8%
n 11277
 
4.8%
Other values (48) 69851
29.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 214453
90.6%
Uppercase Letter 22303
 
9.4%
Close Punctuation 15
 
< 0.1%
Space Separator 8
 
< 0.1%
Dash Punctuation 6
 
< 0.1%
Other Punctuation 6
 
< 0.1%
Open Punctuation 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 25775
12.0%
i 22604
10.5%
o 18830
 
8.8%
e 18657
 
8.7%
r 16116
 
7.5%
l 16112
 
7.5%
s 14304
 
6.7%
c 11983
 
5.6%
t 11285
 
5.3%
n 11277
 
5.3%
Other values (16) 47510
22.2%
Uppercase Letter
ValueCountFrequency (%)
P 3361
15.1%
L 2407
10.8%
C 1994
8.9%
A 1878
 
8.4%
M 1420
 
6.4%
S 1416
 
6.3%
N 1385
 
6.2%
T 1191
 
5.3%
D 1188
 
5.3%
E 1176
 
5.3%
Other values (16) 4887
21.9%
Other Punctuation
ValueCountFrequency (%)
. 5
83.3%
? 1
 
16.7%
Close Punctuation
ValueCountFrequency (%)
) 15
100.0%
Space Separator
ValueCountFrequency (%)
8
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 236756
> 99.9%
Common 38
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 25775
 
10.9%
i 22604
 
9.5%
o 18830
 
8.0%
e 18657
 
7.9%
r 16116
 
6.8%
l 16112
 
6.8%
s 14304
 
6.0%
c 11983
 
5.1%
t 11285
 
4.8%
n 11277
 
4.8%
Other values (42) 69813
29.5%
Common
ValueCountFrequency (%)
) 15
39.5%
8
21.1%
- 6
 
15.8%
. 5
 
13.2%
( 3
 
7.9%
? 1
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 236794
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 25775
 
10.9%
i 22604
 
9.5%
o 18830
 
8.0%
e 18657
 
7.9%
r 16116
 
6.8%
l 16112
 
6.8%
s 14304
 
6.0%
c 11983
 
5.1%
t 11285
 
4.8%
n 11277
 
4.8%
Other values (48) 69851
29.5%

specificEpithet
Text

Missing 

Distinct32184
Distinct (%)6.1%
Missing197674
Missing (%)27.3%
Memory size5.5 MiB
2025-01-14T11:34:14.086244image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length31
Median length21
Mean length7.031748141
Min length1

Characters and Unicode

Total characters3704564
Distinct characters44
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10223 ?
Unique (%)1.9%

Sample

1st rowlunatus
2nd rowhyatti
3rd rowsculpturata
4th rowcuspidata
5th rowrotundobesus
ValueCountFrequency (%)
sp 136976
 
25.7%
splendens 12400
 
2.3%
phaeopygia 3232
 
0.6%
species 2814
 
0.5%
a 2244
 
0.4%
bella 2150
 
0.4%
alba 2016
 
0.4%
megalodon 1645
 
0.3%
confluens 1466
 
0.3%
obscura 1275
 
0.2%
Other values (32112) 367401
68.9%
2025-01-14T11:34:14.359696image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
s 492867
13.3%
a 409545
11.1%
i 366458
9.9%
e 293309
 
7.9%
n 257096
 
6.9%
p 241847
 
6.5%
r 211113
 
5.7%
l 197470
 
5.3%
u 185989
 
5.0%
o 183662
 
5.0%
Other values (34) 865208
23.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3692426
99.7%
Space Separator 6785
 
0.2%
Other Punctuation 3066
 
0.1%
Decimal Number 1873
 
0.1%
Dash Punctuation 413
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 492867
13.3%
a 409545
11.1%
i 366458
9.9%
e 293309
 
7.9%
n 257096
 
7.0%
p 241847
 
6.5%
r 211113
 
5.7%
l 197470
 
5.3%
u 185989
 
5.0%
o 183662
 
5.0%
Other values (16) 853070
23.1%
Decimal Number
ValueCountFrequency (%)
1 923
49.3%
2 534
28.5%
3 195
 
10.4%
4 89
 
4.8%
5 66
 
3.5%
6 38
 
2.0%
7 19
 
1.0%
8 5
 
0.3%
9 2
 
0.1%
0 2
 
0.1%
Other Punctuation
ValueCountFrequency (%)
. 3033
98.9%
' 21
 
0.7%
? 6
 
0.2%
* 5
 
0.2%
# 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
6785
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 413
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3692426
99.7%
Common 12138
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 492867
13.3%
a 409545
11.1%
i 366458
9.9%
e 293309
 
7.9%
n 257096
 
7.0%
p 241847
 
6.5%
r 211113
 
5.7%
l 197470
 
5.3%
u 185989
 
5.0%
o 183662
 
5.0%
Other values (16) 853070
23.1%
Common
ValueCountFrequency (%)
6785
55.9%
. 3033
25.0%
1 923
 
7.6%
2 534
 
4.4%
- 413
 
3.4%
3 195
 
1.6%
4 89
 
0.7%
5 66
 
0.5%
6 38
 
0.3%
' 21
 
0.2%
Other values (8) 41
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3704564
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 492867
13.3%
a 409545
11.1%
i 366458
9.9%
e 293309
 
7.9%
n 257096
 
6.9%
p 241847
 
6.5%
r 211113
 
5.7%
l 197470
 
5.3%
u 185989
 
5.0%
o 183662
 
5.0%
Other values (34) 865208
23.4%

infraspecificEpithet
Text

Missing 

Distinct3295
Distinct (%)20.0%
Missing708037
Missing (%)97.7%
Memory size5.5 MiB
2025-01-14T11:34:14.558764image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length21
Median length18
Mean length8.558557465
Min length1

Characters and Unicode

Total characters140968
Distinct characters47
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1244 ?
Unique (%)7.6%

Sample

1st rowamplexoides
2nd rowgrandis
3rd rowcanalis
4th rowcooperensis
5th rowpyramidale
ValueCountFrequency (%)
burchelli 494
 
3.0%
halli 243
 
1.5%
a 159
 
1.0%
pugilla 151
 
0.9%
spinifera 136
 
0.8%
b 135
 
0.8%
antarctica 104
 
0.6%
bellaplicata 81
 
0.5%
nasiterna 79
 
0.5%
minor 78
 
0.5%
Other values (3272) 14872
90.0%
2025-01-14T11:34:14.822339image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 18791
13.3%
i 14907
10.6%
s 13226
9.4%
e 11648
 
8.3%
n 10012
 
7.1%
t 8967
 
6.4%
r 8880
 
6.3%
l 8863
 
6.3%
u 7809
 
5.5%
o 7067
 
5.0%
Other values (37) 30798
21.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 140678
99.8%
Dash Punctuation 99
 
0.1%
Decimal Number 63
 
< 0.1%
Space Separator 61
 
< 0.1%
Uppercase Letter 43
 
< 0.1%
Other Punctuation 24
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 18791
13.4%
i 14907
10.6%
s 13226
9.4%
e 11648
 
8.3%
n 10012
 
7.1%
t 8967
 
6.4%
r 8880
 
6.3%
l 8863
 
6.3%
u 7809
 
5.6%
o 7067
 
5.0%
Other values (16) 30508
21.7%
Uppercase Letter
ValueCountFrequency (%)
T 15
34.9%
B 7
16.3%
D 5
 
11.6%
L 4
 
9.3%
F 3
 
7.0%
I 2
 
4.7%
G 1
 
2.3%
S 1
 
2.3%
E 1
 
2.3%
A 1
 
2.3%
Other values (3) 3
 
7.0%
Decimal Number
ValueCountFrequency (%)
1 39
61.9%
3 11
 
17.5%
2 9
 
14.3%
4 3
 
4.8%
5 1
 
1.6%
Dash Punctuation
ValueCountFrequency (%)
- 99
100.0%
Space Separator
ValueCountFrequency (%)
61
100.0%
Other Punctuation
ValueCountFrequency (%)
. 24
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 140721
99.8%
Common 247
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 18791
13.4%
i 14907
10.6%
s 13226
9.4%
e 11648
 
8.3%
n 10012
 
7.1%
t 8967
 
6.4%
r 8880
 
6.3%
l 8863
 
6.3%
u 7809
 
5.5%
o 7067
 
5.0%
Other values (29) 30551
21.7%
Common
ValueCountFrequency (%)
- 99
40.1%
61
24.7%
1 39
 
15.8%
. 24
 
9.7%
3 11
 
4.5%
2 9
 
3.6%
4 3
 
1.2%
5 1
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 140968
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 18791
13.3%
i 14907
10.6%
s 13226
9.4%
e 11648
 
8.3%
n 10012
 
7.1%
t 8967
 
6.4%
r 8880
 
6.3%
l 8863
 
6.3%
u 7809
 
5.5%
o 7067
 
5.0%
Other values (37) 30798
21.8%

taxonRank
Text

Missing 

Distinct5
Distinct (%)< 0.1%
Missing707802
Missing (%)97.7%
Memory size5.5 MiB
2025-01-14T11:34:14.884696image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length10
Median length10
Mean length8.738058183
Min length5

Characters and Unicode

Total characters145978
Distinct characters19
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsubspecies
2nd rowvariety
3rd rowsubspecies
4th rowvariety
5th rowsubspecies
ValueCountFrequency (%)
subspecies 9791
58.6%
variety 6728
40.3%
forma 134
 
0.8%
morpha 37
 
0.2%
clade 16
 
0.1%
2025-01-14T11:34:14.987164image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
s 29373
20.1%
e 26326
18.0%
i 16519
11.3%
p 9828
 
6.7%
b 9791
 
6.7%
c 9791
 
6.7%
u 9791
 
6.7%
a 6915
 
4.7%
r 6899
 
4.7%
v 6728
 
4.6%
Other values (9) 14017
9.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 145962
> 99.9%
Uppercase Letter 16
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 29373
20.1%
e 26326
18.0%
i 16519
11.3%
p 9828
 
6.7%
b 9791
 
6.7%
c 9791
 
6.7%
u 9791
 
6.7%
a 6915
 
4.7%
r 6899
 
4.7%
v 6728
 
4.6%
Other values (8) 14001
9.6%
Uppercase Letter
ValueCountFrequency (%)
C 16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 145978
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 29373
20.1%
e 26326
18.0%
i 16519
11.3%
p 9828
 
6.7%
b 9791
 
6.7%
c 9791
 
6.7%
u 9791
 
6.7%
a 6915
 
4.7%
r 6899
 
4.7%
v 6728
 
4.6%
Other values (9) 14017
9.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 145978
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 29373
20.1%
e 26326
18.0%
i 16519
11.3%
p 9828
 
6.7%
b 9791
 
6.7%
c 9791
 
6.7%
u 9791
 
6.7%
a 6915
 
4.7%
r 6899
 
4.7%
v 6728
 
4.6%
Other values (9) 14017
9.6%
Distinct7319
Distinct (%)1.8%
Missing325030
Missing (%)44.9%
Memory size5.5 MiB
2025-01-14T11:34:15.171354image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length103
Median length51
Mean length9.144288296
Min length2

Characters and Unicode

Total characters3652942
Distinct characters76
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1579 ?
Unique (%)0.4%

Sample

1st rowMeek
2nd row(Cushman)
3rd row(Agassiz)
4th rowCooper & Grant
5th rowCuvier
ValueCountFrequency (%)
77310
 
13.1%
walcott 26311
 
4.5%
cooper 26282
 
4.4%
cushman 17375
 
2.9%
grant 16892
 
2.9%
ulrich 12249
 
2.1%
et 9463
 
1.6%
al 9463
 
1.6%
hall 8176
 
1.4%
bassler 5943
 
1.0%
Other values (4208) 381568
64.6%
2025-01-14T11:34:15.444872image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 302103
 
8.3%
a 256596
 
7.0%
o 243833
 
6.7%
r 239853
 
6.6%
n 225453
 
6.2%
l 204010
 
5.6%
191554
 
5.2%
t 170449
 
4.7%
i 153159
 
4.2%
s 150944
 
4.1%
Other values (66) 1514988
41.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2653589
72.6%
Uppercase Letter 500242
 
13.7%
Space Separator 191554
 
5.2%
Open Punctuation 106388
 
2.9%
Close Punctuation 106388
 
2.9%
Other Punctuation 92459
 
2.5%
Dash Punctuation 2322
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 302103
11.4%
a 256596
9.7%
o 243833
9.2%
r 239853
 
9.0%
n 225453
 
8.5%
l 204010
 
7.7%
t 170449
 
6.4%
i 153159
 
5.8%
s 150944
 
5.7%
h 100899
 
3.8%
Other values (31) 606290
22.8%
Uppercase Letter
ValueCountFrequency (%)
C 79537
15.9%
W 50766
 
10.1%
G 43481
 
8.7%
S 39975
 
8.0%
B 33936
 
6.8%
M 30616
 
6.1%
H 30047
 
6.0%
L 27178
 
5.4%
R 20489
 
4.1%
P 15813
 
3.2%
Other values (16) 128404
25.7%
Other Punctuation
ValueCountFrequency (%)
& 77309
83.6%
. 9415
 
10.2%
' 5434
 
5.9%
, 300
 
0.3%
? 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
191554
100.0%
Open Punctuation
ValueCountFrequency (%)
( 106388
100.0%
Close Punctuation
ValueCountFrequency (%)
) 106388
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2322
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3153831
86.3%
Common 499111
 
13.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 302103
 
9.6%
a 256596
 
8.1%
o 243833
 
7.7%
r 239853
 
7.6%
n 225453
 
7.1%
l 204010
 
6.5%
t 170449
 
5.4%
i 153159
 
4.9%
s 150944
 
4.8%
h 100899
 
3.2%
Other values (57) 1106532
35.1%
Common
ValueCountFrequency (%)
191554
38.4%
( 106388
21.3%
) 106388
21.3%
& 77309
15.5%
. 9415
 
1.9%
' 5434
 
1.1%
- 2322
 
0.5%
, 300
 
0.1%
? 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3650564
99.9%
None 2378
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 302103
 
8.3%
a 256596
 
7.0%
o 243833
 
6.7%
r 239853
 
6.6%
n 225453
 
6.2%
l 204010
 
5.6%
191554
 
5.2%
t 170449
 
4.7%
i 153159
 
4.2%
s 150944
 
4.1%
Other values (50) 1512610
41.4%
None
ValueCountFrequency (%)
ú 939
39.5%
ö 833
35.0%
ž 158
 
6.6%
å 99
 
4.2%
ë 95
 
4.0%
ä 74
 
3.1%
ü 64
 
2.7%
é 48
 
2.0%
ó 17
 
0.7%
ñ 16
 
0.7%
Other values (6) 35
 
1.5%